Library
|
Your profile |
Software systems and computational methods
Reference:
Malakhov S.V., Yakupov D.O.
Investigation of stochastic models of packet generation in computer networks
// Software systems and computational methods.
2024. ¹ 2.
P. 53-72.
DOI: 10.7256/2454-0714.2024.2.70340 EDN: EKXYBU URL: https://en.nbpublish.com/library_read_article.php?id=70340
Investigation of stochastic models of packet generation in computer networks
DOI: 10.7256/2454-0714.2024.2.70340EDN: EKXYBUReceived: 02-04-2024Published: 02-06-2024Abstract: Stochastic packet generation models are models that are used to generate traffic in computer networks with certain characteristics. These models can be used to simulate network activity and test network performance. Standard data transmission on the network is packet generation with delays, in which packets are sent at certain intervals. Various stochastic models can be used to generate delayed packets, including uniform distribution, exponential distribution, and Erlang distribution. In this work, an experimental setup was assembled and a client-server application was developed to conduct research and analyze the performance of the data transmission channel. An algorithm has been proposed that allows to restore the moment characteristics of a random value of the interval between packets for further use of queuing models. The analysis of the distribution laws on the performance of the experimental network sample was performed and estimates of the efficiency of channel use and the average packet generation time in network segments, as well as histograms of delays according to the distribution laws, were obtained. An experimental setup was created, and a client-server application was developed to analyze the performance of the data transmission channel. An algorithm for restoring the moment characteristics of the time intervals between packets is proposed. The analysis of the distribution laws on network performance was carried out, estimates of the efficiency of channel use and the average packet generation time in network segments were obtained, as well as histograms of delays according to the distribution laws. The generation of packets with delays according to stochastic distribution laws (uniform, exponential, Erlang) is of great importance in modeling and analyzing the operation of network systems. Also, the generation of packets with delays according to the above-mentioned distribution laws allows testing and debugging of network applications and devices in conditions close to real ones. This allows to identify possible problems and improve the operation of network systems. As a result of the experiment, an algorithm was proposed that allows to restore the moment characteristics of a random value of the interval between packets for further use of queuing models. Also, the analysis of the influence of distribution laws on the performance of the experimental network sample was performed and estimates of the efficiency of channel use and the average packet generation time in network segments, as well as histograms of delays according to distribution laws, were obtained. Keywords: uniform distribution, exponential distribution, Erlang distribution, packet switching, delays, client-server application, traffic analysis, data transmission, generating packages, Stochastic modelsThis article is automatically translated. Introduction Packet generation is the process of creating and sending data packets over a network. This is an important function that is used in modern computer networks to transfer information between devices. Packet generation is carried out by software that allows you to create and send data packets with specified parameters. Packet generation in computer networks allows you to solve the following tasks:
The purpose of packet generation is to ensure efficient data transmission on the network. This is achieved by splitting large amounts of data into smaller blocks that can be transmitted over the network more quickly and reliably. In addition, packet generation allows you to manage the flow of data and monitor the quality of service on the network. The advantages of packet generation include high data transfer rates, efficient use of network resources, and the ability to ensure the reliability and security of data transmission. Packet generation also allows you to optimize your network infrastructure and improve the performance of network applications. The initial moment and the kurtosis are two main characteristics that can be used to describe the distribution of traffic in the network [1]. The initial moment is a numerical characteristic that describes the distribution of data relative to their average value. The initial moment of the first order is zero, and the initial moment of the second order is equal to the variance of a random variable. In the context of network traffic, the initial moment can be used to describe the distribution of the volume or duration of packets in traffic. For example, the second-order initial moment can be used to determine the variance of packet sizes in traffic. Kurtosis is a numerical characteristic that describes the shape of the data distribution. The kurtosis can be positive, negative, or zero. A positive kurtosis means that the distribution has heavier tails and a higher peak than the normal distribution. A negative kurtosis means that the distribution has lighter tails and a lower peak than the normal distribution. An excess of zero means that the distribution has the form of a normal distribution. In the context of network traffic, kurtosis can be used to describe the shape of packet size distribution in traffic. For example, a positive kurtosis may indicate the presence of heavy tails in the distribution of packet sizes, which may indicate the presence of large packets or unusual events in traffic. A negative kurtosis may indicate the presence of light tails in the distribution of packet sizes, which may mean the absence of large packets and uniformity of traffic. In addition, the initial moment and kurtosis can be used to compare different types of traffic or different network devices. For example, an analysis of the initial moment and kurtosis can help in comparing traffic passing through different routers or switches. This can help identify performance issues with network devices or determine the optimal network configuration.
Problem statementIn the modern world, computer networks play a key role in the transmission and processing of information. One of the important tasks in the design and optimization of such networks is traffic modeling, in particular, the generation of data packets. Stochastic packet generation models are one of the most commonly used approaches to solve this problem. However, existing models do not always adequately reflect the actual conditions of data transmission in computer networks, which leads to inaccuracies in forecasting and optimizing network parameters. The purpose of this work is to study stochastic models of packet generation in computer networks based on uniform, exponential and Erlang distribution. Theoretical and experimental analysis of these models will be carried out, as well as a comparison of their accuracy and efficiency in modeling real traffic. A uniform distribution is a probability distribution in which all values in a given range have the same probability of occurrence. In other words, each value in this range has the same probability of being selected. For example, if a random number from 1 to 6 is generated with a uniform distribution, then each number from 1 to 6 will have a 1/6 probability of being selected. Uniform distribution is widely used in various fields, including statistics, physics, economics, engineering, computer science and many others. It is one of the simplest and most understandable types of probability distribution, and therefore is often used as a model of a random process. The uniform distribution can be discrete or continuous. In the case of a continuous uniform distribution, all values in a certain interval are equally probable and the distribution is given by the formula f(x) = 1/(b-a), where a and b are the ends of the interval, and f(x) is the probability density at point x. Uniform distribution can also be used to generate pseudorandom numbers, which are used in computer applications to generate random sequences. A uniform distribution has several properties that can be useful in its analysis or use in modeling. Some of these properties include:
Uniform distribution can also be used to solve various problems, such as determining the probability that a random variable will fall within a certain range of values, or to find optimal values in optimization problems. To generate packets using uniform distribution, the following steps must be performed:
An exponential distribution is a statistical probability distribution that describes the time between two consecutive events in a Poisson process. For example, the time between receiving two data packets in computer networks can be described by an exponential distribution. The main properties of the exponential distribution:
Exponential distribution has the property of having no memory. This means that the probability of an event occurring at any given time does not depend on how much time has passed since the previous event. For example, if the average time between two calls in a call center is 5 minutes, then the probability that the next call will occur within the next minute is 1/5, regardless of how much time has passed since the previous call. It is worth noting that the exponential distribution is one of the main distributions used in statistical data analysis, and its properties can be used to evaluate the parameters of other distributions and to test statistical hypotheses. To generate packets using exponential distribution, follow these steps:
The Erlang distribution is a mathematical model used to describe the time between two consecutive events in the Poisson process [3]. The Poisson process is a stochastic process in which random events occur independently of each other with constant intensity. The Erlang distribution determines the probability that the number of events that occurred in a fixed time interval will be equal to a given value. This distribution can be used to model the time between calls in the call center, the time between the arrival of customers in the store, or the time between two consecutive triggers of a certain event in the production process. In general, the Erlang distribution can be represented as the sum of k independent random variables, each of which has an exponential distribution with the parameter ?. The Erlang distribution is a special form of gamma distribution. The parameter k in the Erlang distribution is called the shape of the distribution, and the parameter ? is called the intensity. The mathematical expectation of the Erlang distribution is k/?, and the variance is k/? 2. There are several methods for calculating probabilities for the Erlang distribution, including the use of tables and special software packages. If the Poisson process is heterogeneous, then an Erlang distribution with variable intensity can be used. In this case, the parameter ? will depend on time. If the number of events that can occur in a fixed time interval is unlimited, then the Weibull distribution or the Pareto distribution can be used instead of the Erlang distribution. To generate packets using the Erlang distribution, the following steps must be performed:
Methodology and research conditionsAn experimental installation was assembled, consisting of a client and a server on which Wireshark incoming traffic analysis software was installed [4] (Fig. 1)
Fig. 1 – Experimental setup
The experimental setup consists of:
The developed client-server application is written in the Java programming language. Program size: 774 Kbytes. No installation is required [5]. Operating system version for client and server applications: MS Windows 7-11, Mac OS X, Linux To run the application on the client and on the server, an implementation of the Java platform specification, including a compiler and class libraries (Jdk–17), installed on any operating system (Windows 7-11, Mac OS X, Linux), is required. The client generates data packets to the server (Fig. 2). The server receives the packets (Fig. 3) and analyzes the received traffic using the Wireshark program. Fig. 2 – Generation of client packages with Exponential distribution
In this window, the following generation parameters are specified: Server address and port, Packet size, Packet generation time, File size, Distribution law, Distribution law parameters, Traffic type. Fig. 3 – Accepting server packages
In this window, the server is configured and the Port and Type of traffic are specified. The experiment consists of 3 main stages: 1. Select the distribution law for each experiment in turn (Erlang, exponential, uniform). 2. In each experiment, set the packet size (several different values). 3 different packet sizes were selected, in which the size of the payload is S (64 bytes), M (512 bytes), L (1024 bytes). 3. Packet generation takes 120 seconds (2 minutes) Results and analysis of the studyUsing well-known formulas of mathematical statistics, the moment characteristics of time intervals are determined. The work uses statistics up to the 3rd order, which allow us to judge the nature of the distribution of intervals. [5,6]. The average value of the interval between adjacent packets (1) is calculated to estimate the frequency of packets in the network. This allows you to determine the frequency with which packets are transmitted over the network. By evaluating this parameter, you can control the flow of data, which is important for maintaining the quality of services on the network. where are the time points of packet receipt; N is the number of analyzed intervals. Sample variance (2) is a statistical parameter that shows how large the deviation of each element of the sample from its average value is. This indicator reflects the degree of data dispersion in the sample. This parameter is useful when analyzing data to understand how much data changes or differs. where is the second initial moment. The second initial moment (3) of a kind of statistical quantity is the mathematical expectation of the square of this quantity. It can be used to estimate the spread of observed values relative to zero. This is an important characteristic in the analysis of random variables, especially when using statistical modeling and forecasting. The coefficient of variation (4) is a statistical indicator that determines the relative variation or spread of observations in a given sample. The higher the coefficient of variation, the greater the spread of data in the sample. It is used to compare the level of variability of different data. where is the standard deviation Asymmetry (5) is a statistical indicator that characterizes the degree of deviation of the sample distribution from the symmetrical one. This allows us to assess the asymmetry of the data distribution. If the asymmetry is positive, then the distribution is shifted to the right, if negative, to the left. The asymmetry is zero for a symmetric distribution. The calculation of asymmetry is especially important in data analysis in order to choose the right statistical approach for analysis. where is the third initial moment The third initial moment (6) is the mathematical expectation of a cube of a random variable. It is used to calculate the asymmetry coefficient, which characterizes the degree of deviation of the data distribution from the symmetric one. This parameter is needed to understand the shape of the distribution of the dataset. It provides information about whether the distribution of the dataset is skewed and, if so, in which direction. This is an important parameter when choosing a statistical approach to data analysis. To calculate the moment characteristics, data from the Wireshark log files were imported into MS Excel. Packets sent from the client to the server will be taken into account. Table 1 shows the data of the developed client-server application, and Table 2 shows the data of the Netem program [7]. Table 1. Average moment characteristics of the interval between packets according to the laws of distribution using the developed client-server application
Table 2. Average moment characteristics of the interval between packets according to the laws of distribution using the Netem program
Based on the calculation of the average moment characteristics of the interval between packets according to the laws of distribution using the developed client-server application and the Netem program, it can be concluded that the developed application generates packets with delays more accurately according to the laws of distribution. The accuracy difference is ~14% in favor of the developed client-server application. The coefficient of variation shows the difference between traffic and the Poisson flow and, together with the asymmetry, allows us to judge the degree of weight of the tails of the distribution of intervals between packets The second initial moment shows the variance of a random variable, that is, a measure of the spread of values around the average value. The higher the value of the second initial moment, the greater the spread of values and the less predictable the packet generation process is. The smallest variation is observed in a uniform distribution. The third initial moment of a random variable shows the average value of its cube. If the initial moment 3 is not zero, then this indicates that the probabilities for values at different distances from the average value are not symmetric. For a uniform distribution, the third initial moment is the lowest. The asymmetry of a random variable shows how much its distribution differs from the symmetric one. If the asymmetry is zero, then the distribution is symmetric. If the asymmetry is greater than zero, then the distribution is skewed to the right (more values are to the right of the mean) and the distribution has a heavier right tail. For a uniform distribution, the asymmetry is almost equal to 0, for an Erlang and Exponential distribution it is greater than 0. If the coefficient of variation is greater than zero, then this indicates that the spread of values of a random variable relative to its average value is large. Thus, the higher the value of the coefficient of variation, the higher the level of variability of the random variable. In a uniform distribution, the coefficient of variation is the smallest [8,9]. Histograms of distributions were constructed based on the obtained delays. With the Erlang distribution (Fig. 4), the delay value is maximum at intervals of 336-430 ms. As the intervals increase, the delay decreases.
Fig. 4 – The resulting histogram of the Erlang distribution
With an Exponential distribution (Fig. 5), the delays are maximum for short time intervals. Fig. 5 – The resulting histogram of the Exponential distribution
With a uniform distribution (Fig. 6), the delays for different intervals are 300 ms. Which is significantly worse of the above. Fig. 6 – The resulting histogram of Uniform distribution
The volume transmitted in 2 minutes for each distribution law included both useful data and service data. Depending on the packet size, the efficiency of the payload varied [10] (Fig. 7). Maximum efficiency was achieved with large packet sizes.
Figure 7 – Efficiency of the payload (S – 64 bytes, M – 512 bytes, L – 1024 bytes)
During the experiment, small packet losses occurred during data transfer, on average less than 50 packets (<0.01%). In the process of generating packets from the client, incoming traffic on the server loaded the channel. Depending on the packet size and the distribution law, the load had different values (Fig. 8). With the Erlang distribution, the channel load turned out to be the largest, which indicates the greater efficiency of this model. Figure 8 – Channel load according to the laws of distribution (S – 64 bytes, M – 512 bytes, L – 1024 bytes)
It is believed that for loaded Ethernet and Fast Ethernet systems, a good value for network utilization is 30%. This value corresponds to the absence of prolonged network downtime and provides sufficient margin in case of peak load increase [11]. The traffic generated by the Erlang distribution, with a payload packet size of 1024 bytes (29.5%), became the most efficient law in terms of channel utilization. The same result confirms the average packet generation time. Depending on the size of the packet and the distribution law, the time had different values (Fig. 9) [12]. The Erlang distribution model showed the lowest average packet generation time. Figure 9 – Average packet generation time according to distribution laws (S – 64 bytes, M – 512 bytes, L – 1024 bytes)
Conclusion The generation of packets with delays according to stochastic distribution laws (uniform, exponential, Erlang) is of great importance in modeling and analyzing the operation of network systems. Also, the generation of packets with delays according to the above-mentioned distribution laws allows testing and debugging of network applications and devices in conditions close to real ones. This allows you to identify possible problems and improve the operation of network systems. As a result of the experiment, an algorithm was proposed that allows to restore the moment characteristics of a random value of the interval between packets for further use of queuing models. Also, the analysis of the influence of distribution laws on the performance of the experimental network sample was performed and estimates of the efficiency of channel use and the average packet generation time in network segments, as well as histograms of delays according to distribution laws, were obtained. The highest efficiency of the payload is observed with a packet size of 1056 bytes (1024 bytes of payload). The highest channel performance is observed when using the Erlang distribution, with a payload size of 512 bytes and 1024 bytes. Also, this fact confirms the average generation time of one packet, which shows the highest performance when distributing Erlang References
1. Zhukova, G.N. (2015). A Map of Skewness and Kurtosis Coefficients in Teaching Probability Theory and Mathematical Statistics. Concept, 8, 1-4.
2. Dmitriev, E.I., & Medvedev, A.V. (2011). P-Exponential Random Number Generator. Actual Problems of Aviation and Astronautics, 7, 316-317. 3. Erlang distribution. (Accessed March 6, 2023). Retrieved from http://algolist.ru/maths/matstat/erlang/index.php#:~:text=%D0%A0%BC 4. How to use Wireshark for traffic analysis. (Accessed March 6, 2023). Retrieved from https://losst.pro/kak-polzovatsya-wireshark-dlya-analiza-trafika 5. Yakupov, D.O. (2023). Application for generating packets in computer networks using stochastic distribution models. Retrieved from https://elibrary.ru/item.asp?id=50133060 6. Tarasov, V.N., Bakhareva, N.F., Gorelov, G.A., & Malakhov, S.V. (2014). Analysis of incoming traffic at the level of three moments of time interval distributions. Information Technology, 9, 54-59. 7. Emulation of the influence of global networks. (Accessed May 10, 2023). Retrieved from https://habr.com/ru/articles/24046/ 8. Performance Tuning Guide. (Accessed May 10, 2023). Retrieved from http://www.regatta.cs msu.su/doc/usr/share/man/info/ru_RU/a_doc_lib/aixbman/prftungd/2365c91.html 9. Ethernet/Fast Ethernet network algorithms. (Accessed May 10, 2023). Retrieved from https://intuit.ru/studi/professional_retraining/943/sources/57/lecture/1690?page=2 10. Providing data packets with accurate timestamps in network monitoring systems. (Accessed May 10, 2023). Retrieved from http://www.treatfake.ru/solutions/network-monitoring-systems/accurate-time-stamping-of-data-packets-in-network-monitoring-systems
First Peer Review
Peer reviewers' evaluations remain confidential and are not disclosed to the public. Only external reviews, authorized for publication by the article's author(s), are made public. Typically, these final reviews are conducted after the manuscript's revision. Adhering to our double-blind review policy, the reviewer's identity is kept confidential.
Second Peer Review
Peer reviewers' evaluations remain confidential and are not disclosed to the public. Only external reviews, authorized for publication by the article's author(s), are made public. Typically, these final reviews are conducted after the manuscript's revision. Adhering to our double-blind review policy, the reviewer's identity is kept confidential.
Third Peer Review
Peer reviewers' evaluations remain confidential and are not disclosed to the public. Only external reviews, authorized for publication by the article's author(s), are made public. Typically, these final reviews are conducted after the manuscript's revision. Adhering to our double-blind review policy, the reviewer's identity is kept confidential.
|