Trub I. —
On approximation of the output of a probabilistic model of hierarchical bit indices
// Software systems and computational methods. – 2018. – ¹ 4.
– P. 102 - 113.
DOI: 10.7256/2454-0714.2018.4.27809
URL: https://en.e-notabene.ru/itmag/article_27809.html
Read the article
Abstract: The subject of the study is a probabilistic model of hierarchical bit indexes of databases. The object of the study is the output of the model — a three-parameter discrete distribution of the number of indexes for implementing queries to the database, parametrized by the intensity of recording records in the database, the average query length, and the size of a large index. The author considers such aspects of the topic as the choice of a hypothesis from known theoretical distributions, a method for testing a hypothesis, selection of functions for approximating the dependence of the expectation on the third parameter, selection of a function for approximating the dependence of the minimum point of the expectation for the third parameter from the first two. The study of such dependencies is explained by the fact that the optimal choice of the third parameter is the goal of the designer, and the first two are the initial data of the model. The methodology of the research is the methods of mathematical statistics, in particular, the estimation of parameters and the Pearson criterion of testing hypotheses, methods for constructing the best approximations, in particular, the method of least squares, the theory of curves of the third order. The main conclusions of the study: the best approximation for the studied family of distributions is the Polya distribution; The best approximations for the dependence of the expectation on the third parameter are the Bacon-Watts model and the heat capacity model. A special contribution of the author to the study of the topic is the derivation of an empirical formula that has practical significance. It allows the designer on the basis of the first two parameters at once, without using cumbersome calculations on the model, to obtain an approximate optimal value of the third parameter and thus construct an index of the database of the optimal size. The novelty of the research lies in obtaining approximate dependencies for a new type of distribution that cannot be described by a closed formula.
Trub I. —
Probabilistic model of hierarchical database indexes
// Software systems and computational methods. – 2017. – ¹ 4.
– P. 15 - 31.
DOI: 10.7256/2454-0714.2017.4.24437
URL: https://en.e-notabene.ru/itmag/article_24437.html
Read the article
Abstract: The subject of the study is the concept of hierarchical bitmap-indexes proposed by the author. It is that in order to improve the processing performance of queries on the time filter, the indices are supported not only for the values of the basic unit of time, but also for arbitrary larger multiple units. The object of the study is the construction of an analytic probability model of such indices for the particular case of the exponential distribution of a random stream of recording records in a database. The author focuses on such an aspect as the calculation of the discrete distribution of the number of indices involved in the processing of the query. The methodology of the study is probability theory, combinatorial methods, measure theory, computational experiment. In addition, it is shown that the latest concepts of the theory of cellular automata, such as the Zaitsev's neighborhood, can be used to study the features of the proposed model. The main results of the work can be formulated as follows: introduced an original, intuitive concept of building indexes; new, meaningful optimization problems for selecting a hierarchical index system are formulated; a mathematical model is constructed and verified, allowing to estimate the efficiency of using the chosen hierarchy of indices. It is shown that in the limiting case the model naturally tends to a set of fractal nature, in particular, one of the varieties of Cantor dust, for which the formula for calculating its Hausdorff-Besicovitch dimension is derived through the application parameters of the initial problem.
Trub I. —
Numerical Modelling of the General Task of Bitmap-Indices Distribution
// Software systems and computational methods. – 2017. – ¹ 3.
– P. 35 - 53.
DOI: 10.7256/2454-0714.2017.3.22952
URL: https://en.e-notabene.ru/itmag/article_22952.html
Read the article
Abstract: The subject of the research is the mathematical model in the form of recurrent integral relation system that describes the distribution of unit intervals where at least one random stream event with the arbitrary distribution function has happened. The author of the article examines numerous aspects of the numerical implementation of this system such as the Laplace transformation access method, numerical integration near discontinuity points, stability of calculations, validity check results, and particularities of real number machine arithmetic. Trub pays special attention to the connection between calculation data and semantics of the applied problem which solution is represented by these data. The methodology of the research is based on the probability theory (distribution types and qualities), numerical mathematics methods (numerical integration, interpolation, Laplace transformation), software implementation of the mathematical model and computing experiment conduction. The main conclusions of this research is the validity and numerical implementability of the mathematical model created by the author of the article as well as substantiation of the numerical solution for arbitrary distribution of the random stream events distribution. The novelty of the research is caused by the fact that the author develops a numerical solution of the bitmap-indices distribution for Weibull distribution, gamma distribution, logarithmically normal distribution, etc., and analyzes dependencies of different kinds such as the indices density function and average number of indices for the specified interval length.
Trub I. —
On the distribution of bitmap indexes quantity for an arbitrary flow of writing data to a database
// Software systems and computational methods. – 2017. – ¹ 1.
– P. 11 - 21.
DOI: 10.7256/2454-0714.2017.1.21790
URL: https://en.e-notabene.ru/itmag/article_21790.html
Read the article
Abstract: The subject of the study is the problem of calculating distribution of probabilities of the number of unit intervals for a time interval of a given length, on which at least one event of some random flow predetermined by arbitrary probability distribution occurs. The applied value of this task is that its solution gives an estimate of the number of bitmap-indexes falling into the result of a query to a database filtered by a time range. The object of the study is the mathematical model of the problem constructed by means of the theory of probability, and the methods of obtaining numerical results with its usage. The methodology of the research is a detailed decomposition of the random flow of events, taking into account the applied logic of the problem being solved and the derivation on the basis of methods of probability theory of equations satisfied by the desired distribution function. An important part of the methodology is the use of the Laplace transform apparatus and the application of the convolution theorem. The main result of the study is the formulation of the original problem of queuing theory and the derivation of its solution in the form of a system of integral recurrence equations for a set of auxiliary piecewise continuous functions whose totality allows to obtain the final function. The author formulates a computational algorithm for solving the resulting system. The article shows that, for the case of an exponential distribution, the solution immediately obtained, because of its simplicity, satisfies the system of integral equations.
Trub I. —
Analytical probabilistic modeling of bitmap-indexes
// Software systems and computational methods. – 2016. – ¹ 4.
– P. 315 - 323.
DOI: 10.7256/2454-0714.2016.4.21091
Read the article
Abstract: The study is devoted to bitmap-indexes as a tool of improving efficiency of processing search queries and reporting in the current database. The subject of research is mathematical model of dependence of the number of indexes, required to build a sample that meets the request, on the intensity of adding records to the database and query the specified range of values. This characteristic is most significant for evaluating query processing performance because it determines the number of disjunction operations on bit strings, required to get a result set. This problem arose entirely from the practical needs due to the critical impact of the speed of building of reports on customer value commercial products - database applications. The methodology of this study is probabilistic analytical modeling based on representations of the original data in the form of Poisson process and the use of the apparatus of mathematical analysis (integral calculus and summation rows) to get the final results. The novelty of the research is to develop a suggested mathematical model, which allows to put a wide range of problems of the analysis and optimization. The problem is solved – the author presents the formula for the distribution of the number of indexes, and the average number of indexes in a single query. For each result author evaluated reliability on the basis of an alternative approach or plausible reasoning. The paper sets the tasks of constructing a probabilistic model for the distribution of any type of query processing and optimization using hierarchical bitmap-indexes. It should be noted, that formulated problem and the results obtained have an independent theoretical value within the queuing theory without regard to the application area.