Translate this page:
Please select your language to translate the article


You can just close the window to don't translate
Library
Your profile

Back to contents

Philology: scientific researches
Reference:

The Use of Statistical Calculations in Determining the Necessary and Sufficient Volume of the Studied Material

Lalova Tat'yana Ivanovna

ORCID: 0000-0001-7026-0105

PhD in Philology

Associate professor at the Department of Russian and Foreign Languages of Russian University of Transport

127994, Russia, Moscow, Obraztsova str., 9, of. p. 9

t_lalova@mail.ru

DOI:

10.7256/2454-0749.2023.3.39312

EDN:

JLEGJQ

Received:

04-12-2022


Published:

04-04-2023


Abstract: This article discusses the most common statistical computer programs for processing an array of data. The method of determining "manually" the required amount of experimental material is proposed. Based on the data of descriptive statistics, an analysis of a specific research situation is carried out using mathematical formulas that include a finite set of empirical data obtained in a sample of measurements, finding the "best" value for estimating the "exact" value of the measured value and determining the accuracy of measurements. As a result of the calculations performed, the percentage of permissible error is demonstrated, as well as the necessary amount of the studied material to reduce it. Based on the results of calculations, it becomes possible to draw a conclusion about the reliability of the results of the experiment. This article discusses the use of statistical software to prove the truth and reliability of the conclusions obtained as a result of experiments conducted in the course of scientific research. The given method - data analysis using statistical calculations is of great importance in various types of activities. Statistical data processing is an important element in any activity. In certain professions, statistical analysis is resorted to from time to time, in others - often or even daily. With its help, it is possible to study various data arrays, manage them, draw conclusions from the results obtained, formalize them into tables or present them in the form of graphs when compiling reports and preparing scientific articles.


Keywords:

Software, pronunciation, descriptive statistics, sufficiency, necessity, experimental material, deviation, speakers, auditors, phonemes

This article is automatically translated.

This article is devoted to the problem of turning to statistical software in order to prove the truth and reliability of the conclusions obtained in the course of experimental scientific research.

Statistical data analysis is an important element in any activity. It is necessary for almost everyone: civil servants, developers of various types of technologies, accountants and financiers, researchers in various fields and researchers, students and teachers. In some professions, methods of statistical analysis are used from time to time, in some - on a daily basis. In the large software market, there are quite a variety of application software packages that are professionally focused on processing statistical information and allow you to identify patterns against the background of randomness, make informed conclusions and forecasts, and assess the likelihood of their implementation. Among a large number of similar programs, everyone can choose the one that is right for him to solve the tasks facing him. The most convenient and popular are currently considered to be 1:

- Minitab

- StatSoft (STATISTICA)

- COMSOL

- Microsoft Excel

- SAS (Statistical Analysis Software)

-MATLAB

- SPSS (IBM)

-STATA

- XL STAT

- Wizard Mac.

These packages are not very difficult for users and provide high accuracy of the operations performed. They make it possible to analyze data arrays, manage them, summarize the results obtained, compile tables and graphs for reports and scientific articles, perform computer modeling, etc., that is, they allow you to perform universal tasks for various purposes. The high performance of these programs allows you to perform the necessary calculations at high speed. Both beginners and advanced users can use these programs. There is a convenient customer support system that helps to solve emerging issues. As an example, you can take a closer look at the Microsoft Excel2 package – one of the most popular and universal programs for statistics. Many people are familiar with this program, know its functions and capabilities, distinctive features and main advantages, the main of which is the availability of a set of data analysis tools ("Analysis Package") designed to solve complex statistical problems. Functions that implement statistical methods of data processing and analysis in Microsoft Excel are presented in the form of a variety of independent statistical functions (MEANING; MEDIAN; MODE; DISPLAY; NORM; POISSON; STUDSPOBR and many others), as well as software tools for solving optimization problems and a special software extension – the settings of the "Analysis Package", which is included in the statement of this software product and can be installed at the request of the user. In particular, one of the main functions of the Microsoft Excel "Analysis Package" is descriptive statistics, which allows you to quickly process a set (array) of numerical experimental data and determine the confidence probability and interval, or, given these parameters, calculate a sufficient set of data arrays.

Thus, having access to numerous, including those listed above, user-friendly statistical programs, it is possible to quickly and accurately solve urgent problems in all areas of professional activity. However, all of these software, like most others, were developed by foreign IT campaigns, mainly American ones. In the current situation of an increase in the number of sanctions affecting various spheres of our country's life, it is difficult to predict the possibility of their further use in Russia. At the same time, it is possible to solve many data processing issues that require the use of statistics, regardless of the presence or absence of computer programs, "manually". Of course, everything depends on the volume of the processed material, on the time allowed for this work, on the required accuracy of calculations. Nevertheless, for scientific activity, such processing of an array of experimental data seems justified and not too time-consuming. We will outline the principles of using this technique during the experiment, as well as present the conclusions drawn. The results of any research, in whatever field of science it is carried out, must be reliable and reliable. Otherwise, the hypothesis put forward in the paper may be questioned. In order to avoid such a situation, it should be proved that the volume of experimental material under consideration is necessary and sufficient to formulate the conclusions drawn. To this end, it is necessary to resort to the methodology of statistical processing of the results of the study.Let's focus on the theoretical provisions of statistics that formed the basis of the calculations carried out.

Some information from descriptive statisticsWhen processing the measurement results of a certain quantity X, which has a certain value, but as a result of the influence of various random factors is measured with some random error, the task arises: using a finite set of empirical data obtained in a sample of n measurements, find the "best" value of the estimate of the "exact" value of the measured quantity X and determine the accuracy of measurements.

The best estimate of the value of X is the average value of the sample To estimate the deviation of the measured values from the true (X – ) , it is necessary to know the mean square deviation of this distribution , which determines the confidence probability P and the confidence interval .

     The probability that the random error does not go beyond the range of values <?<is called the confidence interval, and the corresponding probability is the confidence probability.

Rules for processing direct multiple measurement 3When conducting a direct measurement of a certain value , it is necessary:  

1. Perform multiple measurements under the same conditions and record them in a table.

2. Calculate the average value using the formula:

 .

3. Calculate the variance estimate:

.

4. Calculate the root-mean-square error of the mean:

5. Having set the required level of confidence probability P, determine the Student's coefficient and the modulus of the confidence interval from the table:

6. Having rounded up the corresponding results, write down the answer in the form

X = with confidence probability P.

When determining confidence intervals, confidence probability levels are usually assumed to be 0.9 or 0.95, less often 0.99.

Student's coefficient at P = 95% (Table 1).

            Table 1

n-1

2

3

4

5

6

7

8

9

t

4,70

3,18

2,78

2,57

2,45

2,37

2,31

2,26

n-1

10

15

20

30

50

100

200

?

t

2,23

2,13

2,09

2,04

2,01

1,98

1,97

1,96

As an example of the use of the presented theory, we present the determination of the reliability of the results of the study on the pronunciation of a rounded, semi-open anterior lingual sound [oe] in the position after explosive palatal consonants [k-g]. The experiment was conducted on the material of P. Verlaine's poem "Autumn Song" ("Chanson d'automne"). The text was recorded by 10 native Tamil speakers living in Pondicherry (India)4 who are learning French as their first foreign language. The recording was listened to by 26 auditors who speak French well. The group of auditors included teachers and senior students of institutes and faculties of foreign languages of Moscow.

As a result of the auditory analysis, 250 out of 260 possible responses were received. At the same time, 208 responses were negative, and 42 were positive. Which means that for 208 auditors, the sound [oe] in the position after [k-g] is pronounced incorrectly. 42 participants of the auditory analysis considered the pronunciation of the sound in question as corresponding to the orthoepic norm of the French language.

At the next stage of the experiment, it was necessary to determine how reliable the results obtained were, as well as whether enough material was analyzed to confirm the conclusions made. An experiment to study the ways of pronouncing the sound [oe] will be considered as a measurement of a certain parameter of experiments. Ten speakers pronouncing one text (ten Hindus) will be considered as independent ten experiments (Table 2). 26 auditors (their answers) will be considered as the result of measuring the parameter of experiments. Having excluded one missing result from processing, we will carry out statistical processing on 25 responses, that is, the positive response of one auditor is 4% of the possible 25 positive responses of all auditors.

1. In accordance with the first paragraph of the rules for processing direct multiple measurements (hereinafter referred to as experience), we compile a table of positive responses (in %) for auditors.

 

 

Table 2

Experience

1

2

3

4

5

6

7

8

9

10

 ,%

12

60

8

68

20

0

0

12

0

0

 

2. Calculate the average value:

3. Calculate the variance estimate:

4. Calculate the root-mean-square error of the mean:

5. Given a confidence probability P = 0.95 and taking from the table t (0.95;9) = 2.26, we calculate the confidence interval:

 

6. Rounding up the results, we get:

This means that 18% of Hindus correctly pronounce the sound under study with an error of ± 18%. To reduce the magnitude of the error, increase the number of speakers. Similarly, you can explore other sounds, and then generalize to the whole language.

It is possible to approximate the required number of speakers to ensure a given accuracy (confidence interval). If we assume that the variance of the measurement results does not depend on their number, then the expression for the confidence interval follows:

Taking the value t = 2.26 for the confidence probability P = 0.95 (for n = 9 from the table) and the confidence interval - = 10%, we get:

That is, to reduce the research error to 10% with a confidence probability of 95%, the number of speakers should be at least tripled.

To more accurately determine the% of all Pondicherry residents who correctly reproduce the sound [oe], it is necessary to increase the number of auditors.

           Based on the statistical analysis carried out, it should be concluded that with a confidence probability of 95%, the sample size taken in 250 experiments ensures the accuracy of the experiment for 82% of speakers who, with an acceptable error of ± 18%, incorrectly pronounce the phoneme [oe].

Based on this, it can be stated that the results of the auditory analysis of the text of the poem should be considered statistically reliable.

References
1. Bochkarev S.V. [et al.] Planning and processing of experimental results: textbook.-Stary Oskol: TNT, 2020.-508 p.
2. Knyazev B.A., Cherkasov V.S.The beginning of the processing of experimental data.-N.-ed. NGU, 1996.-43 p.
3. https://softlist.com.ua/articles/10-luchshikh-programm-i-instrumentov-dlia-statisiki-v-2022-godu [10 Best Statistics Software and Tools in 2022]

Peer Review

Peer reviewers' evaluations remain confidential and are not disclosed to the public. Only external reviews, authorized for publication by the article's author(s), are made public. Typically, these final reviews are conducted after the manuscript's revision. Adhering to our double-blind review policy, the reviewer's identity is kept confidential.
The list of publisher reviewers can be found here.

Statistical data analysis is an important element in any activity. Undoubtedly, it is necessary for almost everyone: government officials, developers of various types of technologies, accountants and financiers, researchers in various fields and researchers, students and teachers. Data processing is not a formal component, but a holistic set of content assessment. The reviewed article is devoted to the problem of using statistical software in order to prove the truth and reliability of the conclusions obtained during experimental scientific research. I think that such an angle is quite conceptual, it is justified both from the standpoint of methodology and from the level of objectification of the problem. The work is practically oriented; as the author notes, "the experiment was conducted on the material of P. Verlaine's poem "Autumn Song" ("Chanson d'automne"). The text was recorded by 10 native Tamil speakers living in Pondicherry (India)4 who are learning French as their first foreign language. The recording was listened to by 26 auditors who speak French well. The group of auditors included teachers and senior students of institutes and faculties of foreign languages of Moscow", "As a result of the auditory analysis, 250 answers out of 260 possible were received. At the same time, 208 responses were negative and 42 were positive. Which means that for 208 auditors, the sound [oe] in the position after [k-g] is pronounced incorrectly. 42 participants of the auditory analysis considered the pronunciation of the sound in question as corresponding to the orthoepic norm of the French language." I think that the statistical factor is important for the reliability of the experiment, each stage is marked, and it is given an assessment / comment. Calculation formulas illustrate the data block, tables consolidate the results obtained. I think that the work can be a kind of sample for the formation / writing of new projects. The conclusion contains the following information: "based on the statistical analysis carried out, it should be concluded that with a confidence probability of 95%, the sample size of 250 experiments ensures the accuracy of the experiment for 82% of speakers who incorrectly pronounce the phoneme [oe] with an acceptable error of ± 18%. Based on this, it can be stated that the results of the auditory analysis of the text of the poem should be considered statistically reliable." The main levels of work have been maintained, the novelty of the research lies in the analytical verification of the data assessment method used. The style of this work correlates with the actual scientific type, no serious improvements are required. However, the author could expand the bibliographic list to include thematically related developments, this would give the text full weight of the importance of the issue being addressed. In general, the purpose of the work has been achieved, the tasks have been solved; taking into account the above, I state: the article "The use of statistical calculations in determining the necessary and sufficient amount of research material" can be admitted to open publication in the journal "Philology: scientific research".