Kopyrin A.S., Makarova I.L. —
Algorithm for preprocessing and unification of time series based on machine learning for data structuring
// Software systems and computational methods. – 2020. – ¹ 3.
– P. 40 - 50.
DOI: 10.7256/2454-0714.2020.3.33958
URL: https://en.e-notabene.ru/itmag/article_33958.html
Read the article
Abstract: The subject of the research is the process of collecting and preliminary preparation of data from heterogeneous sources. Economic information is heterogeneous and semi-structured or unstructured in nature. Due to the heterogeneity of the primary documents, as well as the human factor, the initial statistical data may contain a large amount of noise, as well as records, the automatic processing of which may be very difficult. This makes preprocessing dynamic input data an important precondition for discovering meaningful patterns and domain knowledge, and making the research topic relevant.Data preprocessing is a series of unique tasks that have led to the emergence of various algorithms and heuristic methods for solving preprocessing tasks such as merge and cleanup, identification of variablesIn this work, a preprocessing algorithm is formulated that allows you to bring together into a single database and structure information on time series from different sources. The key modification of the preprocessing method proposed by the authors is the technology of automated data integration.The technology proposed by the authors involves the combined use of methods for constructing a fuzzy time series and machine lexical comparison on the thesaurus network, as well as the use of a universal database built using the MIVAR concept.The preprocessing algorithm forms a single data model with the ability to transform the periodicity and semantics of the data set and integrate data that can come from various sources into a single information bank.
Ignatenko A.M., Makarova I.L., Kopyrin A.S. —
Methods for preparing data for the analysis of poorly structured time series
// Software systems and computational methods. – 2019. – ¹ 4.
– P. 87 - 94.
DOI: 10.7256/2454-0714.2019.4.31797
URL: https://en.e-notabene.ru/itmag/article_31797.html
Read the article
Abstract: The aim of the study is to prepare for the analysis of poorly structured source data, their analysis, the study of the influence of data "pollution" on the results of regression analysis. The task of structuring data, preparing them for a qualitative analysis is a unique task for each specific set of source data and cannot be solved using a general algorithm, it will always have its own characteristics. The problems that may cause difficulties when working (analysis, processing, search) with poorly structured data are considered. Examples of poorly structured data and structured data that are used in the preparation of data for analysis are given. These algorithms for preparing weakly structured data for analysis are considered and described. The cleaning and analysis procedures on the data set were carried out. Four regression models were constructed and compared. As a result, the following conclusions were formulated: Exclusion from the analysis of various kinds of suspicious observations can drastically reduce the size of the population and lead to an unreasonable decrease in variation. At the same time, such an approach would be completely unacceptable if, as a result, important objects of observation are excluded from the analysis and the integrity of the population is violated. The quality of the constructed model may deteriorate in the presence of abnormal values, but may also improve due to them.