Translate this page:
Please select your language to translate the article


You can just close the window to don't translate
Library
Your profile

Back to contents

Software systems and computational methods
Reference:

Mal'shakov G.V., Mal'shakov V.D. Technique of normalization of the alphabet of search for quality improvement of entity identification based on data frequency characteristics

Abstract: Using frequency distributions of data as identifier it is possible to find data of one system in other systems intended for interaction and coordinate their work. In this case entity identification of a subject domain is done using the alphabet of search. An alphabet of search is a set of lexemes with frequencies of their use in the data, stored as records of a relational database. Object of the research is a technique of normalization of the alphabet of search for improvement of quality of entity identification in a subject domain using frequency characteristics of their data. The technique requires deleting lexemes of the alphabet found in other lexemes of the alphabet with similar frequency of repetition in entity. The methods of the research include the system analysis, the theory of the information, the theory of algorithms, algebra of logic, the theory of sets, the comparative analysis, methods of the intellectual analysis of data and methods of development of the software and databases. The authors prove experimentally (on an example 178 entity), that the given technique allows to reduce the volume of the alphabet of search in 5 times on average, that considerably increases speed of identification entity under frequency characteristics of their data. By reducing the quantity of shorter lexemes the technique of normalization allows to reduce an error of recognition on average by 0.02036 per identification as shown by experiments.


Keywords:

correlation, frequency analysis of data, entity, search, the alphabet, normalization, database, software, identification, method


This article can be downloaded freely in PDF format for reading. Download article


References
1. Mal'shakov G.V. Metodika povysheniya interoperabel'nosti prikladnogo programmnogo obespecheniya na osnove chastotnogo analiza dannykh // Elektrotekhnicheskie kompleksy i sistemy upravleniya.-2015.-¹ 3.-S. 67-70.
2. Mal'shakov G.V. Issledovanie oshibok identifikatsii sushchnostey prikladnogo programmnogo obespecheniya, vypolnyaemoy na osnove chastotnogo analiza dannykh // Naukoemkie tekhnologii.-2015.-¹ 10.-S. 24-28
3. GOST R 55062-2012 “Informatsionnye tekhnologii. Sistemy promyshlennoy avtomatizatsii i ikh integratsiya. Interoperabel'nost'. Osnovnye polozheniya”
4. Bashmakov A.I., Bashmakov I.A. Intellektual'nye informatsionnye tekhnologii: Ucheb. Posobie. – M.: Izd-vo MGTU im. N.E. Baumana, 2005. – 304 s.
5. Khomonenko A. D., Tsygankov V. M., Mal'tsev M. G. Bazy dannykh: Uchebnik dlya vysshikh uchebnykh zavedeniy / Pod red. prof. A. D. Khomonenko.-6-e izd., dop.-SPb.: KORONA-Vek, 2009.-736 s.
6. Sistemy upravleniya bazami dannykh i znaniy: Sprav. izd. / A.N.Naumov, A.M.Vendrov, V.K.Ivanov i dr.; Pod. red. A.N.Naumova. – M.: Finansy i statistika, 1991. – 352 c.: il.