Translate this page:
Please select your language to translate the article


You can just close the window to don't translate
Library
Your profile

Back to contents

Software systems and computational methods
Reference:

Optimization of a recurrent neural network using automata with a variable structure

Dimitrichenko Dmitriy Petrovich

PhD in Technical Science

Researcher, Institute of Applied Mathematics and Automation

360000, Russia, g. Nal'chik, ul. Shortanova, 89a

dimdp@rambler.ru
Other publications by this author
 

 

DOI:

10.7256/2454-0714.2023.4.69011

EDN:

FEIPTC

Received:

13-11-2023


Published:

20-11-2023


Abstract: The subject of this study is to identify a set of common structural properties inherent in recurrent neural networks and stochastic automata, the feature of which is purposeful behavior in dynamic environments. At the same time, the necessary commonality of properties is revealed both in the process of their functioning and in the process of their training (tuning). The author considers in detail such topics as: formalization of purposeful behavior, consideration of the design of automata, as well as a comparative analysis of the considered designs of automata. From the revealed commonality of functioning and the established one-to-one correspondence of neurons of a fully connected recurrent neural network and states of a probabilistic automaton with a variable structure, it follows that the structure of a tuned stochastic automaton can be considered as a reference for a set of connections of a recurrent neural network. This leads, even at the setup stage, to the removal of redundant states (neurons) and connections between them, based on the parameters of the corresponding automaton. The methodology of the conducted research is the construction of a one-to-one correspondence between the neurons of a fully connected recurrent neural network and the internal states of an automaton with a variable structure and the probabilities of transitions between them that are relevant after the tuning process. With a one-to-one correspondence, the probabilities of transitions of the automaton correspond to the weights of connections between neurons of the optimal configuration. The main conclusions of the study: 1. Comparing the structures of recurrent neural networks and automata with a variable structure allows one to take advantage of an automaton with a variable structure to solve the problem of appropriate behavior in dynamic environments and build a recurrent neural network based on it; 2. The correspondence of the internal structure of a recurrent neural network and an automaton with a variable structure allows already at the training stage to release the trained recurrent neural network from redundant neurons and redundant connections in its structure; 3. Due to the fact that an automaton with a variable structure approaches the optimal automaton with linear tactics for these conditions with nonlinear values of the learning rate, this allows a logical analysis of the structure of the final recurrent neural network.


Keywords:

neuron, time sequence, context, management task, appropriate behavior, linear tactics, probability matrix, markov chain, machine, recurrent neural network

This article is automatically translated.

 

 

Introduction

The need to solve problems related to pattern recognition, diagnostics of situations and states of complex systems, division of objects into classes and data clustering [1-5], as well as forecasting and management tasks [6], has led to the creation of a huge variety of types of neural networks.

Various types of neural networks used in practice determine both the scope of their appropriate application and the procedures for their training.

A common example of a neural network is a single-layer perceptron [1-3] consisting of a fixed number of neurons. The number of neurons is determined by the number of classes of recognized objects. Data is fed to the input of each of these neurons: numbers, for example, pixels of a digitized image. The inputs of each of the presented neurons are weighted with positive or negative numbers. Neurons have a degree of sensitivity: a threshold function, or, most often, a smooth, differentiable function replacing it, which is quite often either a sigmoid or a hypertangent. At the output, a neural network response is formed, as a rule, a unit at the output of a neuron with the number of the class to which it signals.

The complication of a fairly simple architecture led, in the end, to the appearance of multi-layered

convolutional networks (deep learning networks).

Practice has shown that the choice of neural network topology is usually associated with the specifics of the task.

After the question of choosing the actual topology to solve the problem, the question arises about the number of layers and neurons in each layer.

Theoretically, the number of layers and the number of neurons in each layer can be arbitrary, but in fact it is limited by the resources of a computer or a specialized chip, i.e. by the features of the computing environment on the basis of which the neural network is supposed to be implemented. At the same time, the empirical regularity remains: the more complex the neural network being constructed, the more complex the task is capable of solving the desired network.

At the same time, the developer of a neural network has to be guided by several fundamental principles:

1.       The network's capabilities increase with an increase in the number of neurons that make up this network, the density of connections between them and the number of allocated layers;

2.       The introduction of feedbacks along with the increase in network capabilities raises the question of the dynamic stability of the network as a whole;

3. The complexity of the network functioning algorithms (including, for example, the introduction of several types of synapses – excitatory, inhibitory and preserving constant values) also contributes to enhancing the quality of the problem being solved;

4.       The complexity of the neural network leads to a nonlinear increase in computational costs for its training and functioning.

The next factor affecting the quality of the solutions obtained by the neural network is the representativeness of the training sample.

The neural network learns exactly the patterns in the data that are contained (perhaps implicitly) in the training sample, so insignificant or error-containing examples will also be assimilated by the neural network as correct and reference, which will also affect the result obtained.

At the stage of neural network training, in addition to the importance of the quality of the selection of representative data, the training time of the neural network plays a significant role. The duration of training may require significant resources, and insufficient training time will not allow the neural network to capture significant and important patterns contained in the training sample, even if representative, with the correct selection of reference examples.

There are two fundamental principles of neural network learning (tuning):

1.       Strengthening or weakening of the connection between neurons by changing weights;

2.       Changing the topology of a neural network using multi-indexes (constructive methods of training neural networks) [7].

The ability of the network to solve the tasks assigned to it during its immediate, targeted (operational) functioning depends on how well the training stage of the neural network will be performed.

Neural network training can be conducted with or without a teacher. In the first case, the network is presented with the values of both input and desired output signals, and it adjusts the weights of its synaptic connections according to some internal algorithm so that the reference states of the inputs correspond to the expected values of the output neurons. In the second case, the outputs of the neural network are formed independently, and the weights are changed according to an algorithm that takes into account only the input and derived signals.

Since a representative training sample contains knowledge about an actual subject area in an explicit (selected by an expert reference rules), and implicit (in the form of environmental signals when learning without a teacher) form, the question of the logical interpretation of decisions obtained by a correctly functioning neural network is essential. The presence of such logical rules would make it possible to explicitly describe the regularities of the subject area under consideration.

One of the approaches to solving this issue is logical neural networks [8,9].

Despite the listed variety of neural network architectures and methods of their training, the general principle is the fact that both input and output data have a fixed size of input and output vectors.

This, in turn, imposes certain restrictions on the number of classes and, for example, the resolution of the analyzed images.

On the other hand, a prerequisite for the efficiency of the learning algorithm, for example, the algorithm of error back propagation, is the condition for organizing a random sequence of reference vectors supplied during the next learning epoch.

When it is necessary to analyze the process of sequential data acquisition, for which the order of placement of this data in the sequence under consideration is an essential condition, the use of recurrent neural networks is an appropriate solution [10,11].

In this case, not only the current (analyzed) data is fed to the input of the neural network, but also data on the state of the neural network itself obtained at the previous step of functioning.

The latter condition allows you to organize the accounting of the memory effect, or the context of information, which is of great importance for the analysis of time sequences.

The first such network was the Hopfield network, which allowed organizing an associative memory cell [11].

Such a network was trained by the method of back propagation of the error and allowed to obtain the value of the reference sample from the noisy data supplied to the input.

A further step was the emergence of Elman networks, which could classify data shifts over time, referring such samples to one class of reference objects.

Practice has shown [10, 11] that recurrent neural networks cope well with processing data represented as a certain time series (time sequence) x(t), t:0,..., T, taking into account the importance of context allocation.

This property allows you to solve the following intellectual tasks:

1.       Building a language model;

2. Frequency data analysis;

3.       Models of attention focusing;

4. Recognition of details in the image based on context and environment;

5. Solving the control problem to build a system of optimal responses to environmental events.

The construction of a language model is based on the analysis (in its simplest form) of paired combinations of words.

The construction of a frequency model allows us to extend the principle of Markov circuits to more complex cases.

The construction of the attention model allows you to find the most important (significant) data in the structure of time series, classifying the rest as secondary.

The classification of details in the image, taking into account the context and environment, allows you to build more accurate recognition models.

The solution of the management problem with the involvement of context is due to the fact that in similar situations, as a rule, it is required to submit similar control commands to the executive mechanisms, which greatly facilitates the learning process. The presence of context allows you to label situations according to the degree of their similarity.

The latter aspect makes it preferable to use recurrent neural networks in control tasks, which distinguishes this type of neural networks with a higher learning rate and better expressiveness of the result obtained.

In this paper, we propose a solution to the control problem using a recurrent neural network.

The basis for its construction in the trained state is a stochastic automaton with a variable structure.

The representation of a recurrent neural network in the form of a stochastic automaton and the use of a reinforcement learning algorithm used to configure this automaton as a training procedure allows:

1. Minimize the structure of a recurrent neural network;

2.       Eliminate, if necessary, redundant neurons;

3. To form the structure of a recurrent neural network that most fully corresponds to the structure of a time sequence of data.

 

The task of formalizing purposeful behavior

Neural networks and linear and stochastic automata discussed below have a lot in common: biological prototypes, mathematically based generality, experimentally confirmed positive results of functioning.

This will allow transferring the results of the stochastic automaton tuning algorithm to build the optimal configuration (within a given subject area) of a recurrent neural network.

Here is a formal description of the automaton:

X = x1, ..., xk – the set of signals coming from the environment to the inputs of the automaton (input alphabet).

D = d 1, ..., dm – a set of actions of the automaton, which it is able to perform with the help of its actuators (output alphabet).

S = s 1,..., sn, 2 <= m <= n is a finite set of internal states of the automaton (internal alphabet).

In general, the number of internal states of the automaton and potentially possible actions affecting the environment may not coincide, then several states may correspond to one action.

The rules for the functioning of the automaton at discrete moments of time are unambiguously set by two functions:

The function of transitions of internal states: st+1 = F(st,xt), and the initial internal state at zero time: s 0 = S(t 0).

Function of dependence of output signals (actions) from internal states: dt = G(st). At the same time, it is absolutely not necessary that different states correspond to different actions.

For n = 1, it will be a trivial automaton with a single non-switchable internal state s = s 1, performing a single action d = d 1 regardless of the signal on its receptors.

We will consider automata as cybernetic devices capable of implementing one or another behavior strategy. The analysis of the behavior of automata is successfully formalized using the apparatus of game theory [12]. This choice is due to the fact that game theory makes it easy to create formal rule systems (constructed environments with predictable responses) and the consequences of elections in terms of winning and losing, which is easily translated into the language of automata as "Encouragement" and "Punishment" for the chosen actions.

Unlike a human player, automata do not have empirical knowledge about the rules of a particular game (the laws of the chosen "environment").

This property allows us to compare the degree of expediency of the behavior of different automata within a single rating scale, as more or less effective players.

The simplest game system, well researched in biological experiments, is a model of a T-shaped maze.

Behavior strategies in the T-shaped maze model are described by two possible actions of an automaton, or an animal in the Thorndyke experiment:

D = d 1– "Left branch"

d 2– "Right branch".

The result of the choice made is perceived by the animal, or is fixed automatically by one of two possible manifestations of the environment:

X = x 1– "Fine"

x 2– "reward".

Each of these actions is associated with its own, given probability of punishment (encouragement) initially unknown to the automaton.

It is required to find the design of the machine that ensures finding the most appropriate action, i.e. associated with the minimum amount of the fine (or the maximum amount of encouragement).

Such constructions of automata were found by M. L. Tsetlin and his followers [13-16].

 

Environment models and automata designs

The environment E, in which, over time, the probabilities of receiving encouragement and punishment for selected actions remain constant, is called stationary.

To describe a stationary environment, a pair of probability values is given for each of the m actions: rewards p and punishments 1 – p.

Thus, for a formal assignment of a stationary environment, it is sufficient to specify an m-component probability vector:

E= e1, …, em = p1, …, pm.

To find the optimal action in such a stationary environment, M. L. Tsetlin proposed the design of an automaton with linear tactics.

Such an automaton, in accordance with the theorem proved by him, almost never leaves the found optimal action, choosing it as if he knew about the maximum probability of encouragement corresponding to this action.

The number of q consecutive states assigned to a certain action is called the memory depth of the automaton. The greater the q, the more inertial the automaton is, which means that the greater the total amount (sequence) of penalties forces it to change actions.

Most models of the E environment must necessarily include a time factor.

For these purposes, a dynamic environment is constructed, in general, representing a Markov chain of stationary media.

The dynamic environment is formed from a set of stationary environments e 1, e 2, .. el, consisting of m actions d 1, d 2, ..., dm each, characterized by their own probabilities of penalties (rewards).

Switching between l stationary media is carried out in accordance with the probability matrix defining the corresponding Markov Chain: P = pij of dimension l*l, i = 1,..., l, j = 1,..., l.

In this case, the element of the matrix pij is equal to the probability of switching the medium ei to the medium ej at the current time t.

It is obvious that under these conditions, an action that is optimal within one stationary environment, for example, e 1, will turn out to be suboptimal in the e 2 environment, as a result of which the automaton will have to switch to another optimal action in the new e 2 environment. The fact of switching from one stationary medium to another for the automaton also remains unknown.

The theorem that for a stationary environment E, an increase in the memory depth q of an automaton with linear tactics asymptotically approximates it to the min(p) minimum of the penalty value [13] is not true in a dynamic environment.

The optimal value of q cannot be obtained analytically [16], and is found only by performing computational experiments. In this case, the values of the memory depth q of the automaton with linear tactics are postponed along the X axis, and the corresponding values of the mathematical expectation of the error for a fixed number of iterations, for example, 10000, are postponed along the Y axis. The qualitative result is that for small values of q, the automaton switches from one action to another too often, without having time to "fix" the optimal action. An increase in the memory depth q leads to a decrease in the error value, then with an increase in q this value decreases: the automaton becomes inert for this dynamic environment, "always late", lingering too long on performing an action that ceases to be effective when switching environments.

Naturally, the question arose about the existence of an automaton that, functioning in a dynamic environment, would independently adapt to its conditions, without the need to resort to the procedure of selecting the depth of memory q for each case of a dynamic environment generated by the corresponding Markov Chain. Such a probabilistic automaton was proposed by V.I. Varshavsky [15].

 

Automaton with variable structure

An automaton with a variable structure is given in matrix form.

The number of matrices is equal to the number of input signals p = P 1,...,Pk, each of which determines the probability of transition from state i to state j if there is a signal x 1,...,xk at the input of the automaton.

The dimension of each matrix is P=P(n,n)=n2, where n is the number of internal states of the automaton. The elements of the matrices P 1,...,Pk satisfy the following conditions:

  • The value of any element of the matrix P is enclosed in a segment of values [0, 1].

2. The sum of rows (or columns) of any of the matrices P 1,...,Pk is equal to one.

In the initial state and at any step of the algorithm, all matrices of the automaton must meet the specified stochasticity conditions.

The direction of the algorithm's actions is to reduce the probabilities of transitions that bring a penalty to the automaton and increase the probabilities of transitions that bring rewards to it during the operation of the automaton with a variable structure.

Let the set of input signals X consist of two elements x1 and x2 equal to "Penalty" and "Reward", respectively, so such an automaton will be determined by two matrices P- and P+. As an example, consider the case when an automaton consists of 4 states s 1, s 2, s 3, s 4 and two actions, each of which is assigned to its own subset of states, different from the other: d 1(s 1, s 2) and d 2(s 3, s 4).

At the initial moment of time, the matrices P+ and P- contain equally probable values of transitions between any two pairs of states.

The initial state is s 1, the initial signal at the input is +1, i.e. the current processed matrix of the automaton is P+ .

At the initial moment of time, the matrices P+ P- have the following form, characterizing equally probable transitions:

0.25, 0.25, 0.25, 0.25

0.25, 0.25, 0.25, 0.25

0.25, 0.25, 0.25, 0.25

0.25, 0.25, 0.25, 0.25

At the initial moment of time, the choice will be made by the first row of the matrix P+, Since the initial state of the automaton is s 1. Let the transition s 1 s 4 be determined using a random number generator. State s 4 corresponds to action d 2. Action d 2 determined the reaction of the environment "Fine" = -1. Consequently, the transition s 1 s 4 and the associated action d 2 are unfavorable for the automaton. Therefore, the probability value p+14 should be lowered, for example, by 0.03.

To preserve the stochasticity conditions of the P+ matrix, the remaining probability values in the first row should be increased by the corresponding values.

The matrix P+ will take the following form:

0.26, 0.26, 0.26, 0.22

0.25, 0.25, 0.25, 0.25

0.25, 0.25, 0.25, 0.25

0.25, 0.25, 0.25, 0.25

Since the automaton received a penalty signal at the input, it will perform the next transition in accordance with the current state of s 4, using the values of the fourth row of the matrix P-. Let the transition s 4 s 2 be made. At the same time, the corresponding action d 1 was performed. At the same time, the machine received a +1 reward.

Such a transition is regarded by the automaton as positive, therefore, the probability of transition p-42 will be increased, and the other elements of the fourth row of the matrix P will be reduced accordingly.

0.25, 0.25, 0.25, 0.25

0.25, 0.25, 0.25, 0.25

0.25, 0.25, 0.25, 0.25

0.24, 0.28, 0.24, 0.24

The following steps for setting up a stochastic automaton are performed in the same way.

The learning rate of the automaton naturally depends on the magnitude of the change in the weights of the specified matrices, determining this speed.

Linear laws, i.e. the constant value of the change in transition probabilities in the matrices N+ and N-, does not always lead to optimal constructions similar to the automata of V. I. Krinsky or G. Robbins [16].

If we introduce a nonlinear change in the elements of these matrices, then the original matrices with equal transition probabilities gradually converge to matrices of zeros and ones corresponding to an automaton with linear tactics and optimal memory depth q* for a given dynamic environment.

We present [16] an important conclusion about the comparison of the functioning of an automaton with linear tactics and an automaton with a changing structure:

  • Machines of the same designs in paired matrix games mostly play in a draw, and each of them, as a rule, achieves the maximum possible win in these specific conditions.
  • When playing two stochastic automata with a variable structure that differ from each other, the one with a smaller learning coefficient and a relatively smaller number of states has small advantages.
  • In a game of two automata, one of which is stochastic with a variable structure, and the second with linear tactics, a stochastic automaton with a variable structure "replays", as a rule, regardless of the memory capacity of the automaton with linear tactics.

The latter circumstance allows us to conclude that in stationary random environments, an automaton with a changing structure functions no worse than an automaton with linear tactics, asymptotically approaching the group of states associated with the best action within a given stationary environment E.

Since in the case of a game of two automata: an automaton with linear tactics and an automaton with a variable structure, an automaton with a variable structure replays, this fact indicates higher adaptive qualities of the latter, compared with an automaton characterized by a fixed memory depth.

 

Building a recurrent neural network

In order to use the above results, it is necessary to build a recurrent neural network, which is the equivalent of an automaton with a variable structure.

To do this , we prove the following statement:

Statement. There is a one-to-one correspondence between a Recurrent neural network and the structure of a corresponding probabilistic automaton designed to organize expedient behavior in a given dynamic stationary environment corresponding to a certain Markov Chain.

The proof is carried out by a constructive method based on a one-to-one correspondence between the neurons of the constructed recurrent neural network and the given states of the automaton with a variable structure. In this case, the probabilities of transitions between the states of the automaton are the weights of the connections between the neurons of the desired recurrent neural network.

The proof method also implies an algorithm for constructing a recurrent neural network based on the initial set of internal states and transition matrices of an automaton with a variable structure.

Corollary 1. The positive qualities and advantages of a stochastic automaton in comparison with an automaton with linear tactics are automatically transferred to the corresponding recurrent neural network.

 Corollary 2. Based on the proven statement and the fact that an automaton with linear tactics corresponds to some pre-trained neural network, it can be seen that a recurrent neural network corresponding to an automaton with a variable structure will have an advantage over a neural network with a fixed structure when working with time series (time sequences), since a fixed neural network the network was trained on a finite segment of a time series, and a recurrent neural network with a variable structure adapts each time to local indicators of a time series that could not fall into the original training sample.

 

Conclusion

The main conclusions of the study are:

Recurrent neural networks are an effective method of analyzing context-dependent data based on time sequences.

The formalization of the context in the form of some internal state allows us to compare the functioning of such a neural network with a probabilistic automaton corresponding to this network with a variable structure and the current set of connections between its states;

This makes it possible to take advantage of an automaton with a variable structure to solve the problem of appropriate behavior in dynamic environments and build a recurrent neural network based on it.

The correspondence of the internal structure of a recurrent neural network and an automaton with a variable structure allows already at the training stage to release the trained recurrent neural network from redundant neurons and connections in its structure.

Since an automaton with a variable structure approaches the optimal automaton with linear tactics for these conditions with nonlinear values of the learning rate, this allows a logical analysis of the structure of the final recurrent neural network.

References
1. Gorban, A.N., & Rossiev, D.A. (1996). Nejronnye seti na personalnom kompyutere. Novosibirsk. Nauka. Sibirskaya izdatelskaya firma RAN.
2. Rutkovskaya, D., Pilinskij, M., & Rutkovskij, L. (2004). Nejronnye seti, geneticheskie algoritmy i nechyotkie sistemy. Moscow: Goryachaya liniya – Telekom.
3. Dimitrichenko, D.P. (2021). A Method for Diagnosing a Robotic Complex Using Logical Neural Networks Apparatus, International Russian Automation Conference (RusAutoCon). 202, pp. 907-911.
4. Kazakov, M.A. (2022) Clustering Algorithm Based on Feature Space Partitioning, 2022 International Russian Automation Conference (RusAutoCon), pp. 399-403.
5. Zhilov, R.A. (2022). Application of the Neural Network Approach when Tuning the PID Controller, 2022 International Russian Automation Conference (RusAutoCon), 228-233.
6. Shibzuxov, Z.M. (2006). Konstruktivnye metody obucheniya sigma-pi nejronnyx setej. Moscow: Nauka.
7. Barskij, A. B. (2016). Logicheskie nejronnye seti. Moscow: Internet-universitet informacionnyx texnologij Binom, Laboratoriya znanij.
8. Dimitrichenko, D.P. (2022). Optimization of the Structure of Variable-Valued Logical Functions when Adding New Production Rules, International Russian Automation Conference (RusAutoCon), pp. 240-245.
9. Osovskij ,S. (2017). Nejronnye seti dlya obrabotki informacii. Moscow: Goryachaya Liniya – Telekom.
10. Pospelov, D.A. ( 1966). Igry i avtomaty. Moscow: Energiya.
11. Cetlin, M.L. (1969). Issledovaniya po teorii avtomatov i modelirovaniyu biologicheskix sistem. Moscow: Nauka.
12. Pospelov, D.A.( 1970). Veroyatnostnye avtomaty. Moscow: Energiya.
13. Varshavskij, V.I.( 1973). Kollektivnoe povedenie avtomatov. Moscow: Nauka.
14. Varshavskij, V.I., & Pospelov, D.A. (1984). Orkestr igraet bez dirizhera: razmyshleniya ob evolyucii nekotoryx texnicheskix sistem i upravlenie imi. Moscow: Nauka
15. Vakulenko, S.A., & Zhixareva, A.A. (2018). Prakticheskij kurs po nejronnym setyam. SPb. Universitet ITMO.
16. Xajkin, S. (2006). Nejronnye seti: polnyj kurs, 2-e izd. Per. s angl. Moscow: Izdatelskij dom "Vilyams.

Peer Review

Peer reviewers' evaluations remain confidential and are not disclosed to the public. Only external reviews, authorized for publication by the article's author(s), are made public. Typically, these final reviews are conducted after the manuscript's revision. Adhering to our double-blind review policy, the reviewer's identity is kept confidential.
The list of publisher reviewers can be found here.

This article is based on an interesting and theoretically useful topic. In the introduction, the author draws attention to the fact that the need to solve problems related to pattern recognition, diagnosing situations and states of complex systems, dividing objects into classes and clustering data, as well as forecasting and management tasks, has led to the creation of a huge variety of types of neural networks. Apparently, the author proceeds from the need to choose neural networks and tries to justify the optimization of a recurrent neural network using automata with a variable structure. But there is no formulation of the purpose of the study in the text and therefore it is unclear why he proposes to do this. In addition, the relevance of the study of the optimization process of a recurrent neural network using automata with a variable structure is clearly insufficiently shown. It is extremely advisable to formulate the purpose of the study and justify its relevance. This text shows the theory of the issue well and the text itself is designed more as an educational material than a research one. The style of presentation resembles an abstract one. In addition, there is no formulation of scientific novelty, although it is obvious. It is only necessary to formulate it correctly. It should be noted that there are grounds for this in the text. For example, the author notes that this paper proposes a solution to the control problem using a recurrent neural network. The basis for its construction in the trained state is a stochastic automaton with a variable structure. This provision can be used as the basis for the formulation of scientific novelty. It is obvious that the author has a good command of the literature on the subject of the study. The information he provides is logical and understandable, and contributes to understanding the essence of neural network learning. It is noted that neural network training can be conducted with or without a teacher. In the first case, the network is presented with the values of both input and desired output signals, and it adjusts the weights of its synaptic connections according to some internal algorithm so that the reference states of the inputs correspond to the expected values of the output neurons. In the second case, the outputs of the neural network are formed independently, and the weights are changed according to an algorithm that takes into account only the input and derived signals. The reviewer fully agrees with the author's opinion on the importance of a representative training sample. It is noted, "since a representative training sample contains knowledge about an actual subject area in an explicit (selected by an expert reference rules) and implicit (in the form of environmental signals during unsupervised learning) form, the question of the logical interpretation of decisions obtained by a correctly functioning neural network is essential. The presence of such logical rules would make it possible to explicitly describe the patterns of the subject area under consideration." All right. The advantage of the article should be recognized as the sequence of presentation of the material in it, which for this reason becomes understandable even to non-professionals. As an argument, it is indicated, referring to practice, that recurrent neural networks do a good job of processing data represented as a certain time series (time sequence) x(t), t: 0, ..., T, taking into account the importance of highlighting the context." The author believes that this property allows solving the following intellectual tasks: 1. Building a language model; 2. Frequency analysis of data; 3. Models of attention focusing; 4. Recognition of details in the image based on context and environment; 5. Solving the control problem to build a system of optimal responses to environmental events. Methodologically, the author relies on the principles of unity and similarity in the functioning of higher nervous activity and machine learning systems. But he does not write directly about this, but it would be necessary to indicate it in order to expand the circle of readers. The fact is that the style of presentation and its language of the author is quite understandable to a specialist in higher nervous activity. Especially when it comes to neurons, etc. The article is concretized if it presents the formulation of the subject of the study. She's not here yet. The content of the work itself indicates that the construction of a language model is based on the analysis of paired combinations of words. The construction of a frequency model allows us to extend the principle of Markov circuits to more complex cases. The construction of an attention model allows you to find the most important data in the structure of time series, classifying the rest as secondary. The article concretizes the position that the representation of a recurrent neural network in the form of a stochastic automaton and the use of a reinforcement learning algorithm as a learning procedure used to configure this automaton allows: 1. Minimize the structure of the recurrent neural network; 2. Eliminate, if necessary, redundant neurons; 3. Form the structure of the recurrent neural network that most fully corresponds to the structure of the time sequence of data. This article is not overloaded with mathematical formulas, this can also be attributed to its advantages. The main emphasis of the author is on clarifying the essence. This can be seen even in the process of formal description of the automaton. The author analyzes the variations and shows that the algorithm for constructing a recurrent neural network based on the initial set of internal states and transition matrices of an automaton with a variable structure follows from the method of proof. In the end, it was concluded that recurrent neural networks are an effective method for analyzing context-dependent data based on time sequences. The correspondence of the internal structure of a recurrent neural network and an automaton with a variable structure allows, already at the learning stage, to release the trained recurrent neural network from excess neurons and connections in its structure. We can agree with such conclusions. The bibliographic list consists of sources on the research topic, but it seemed to the reviewer that some sources could be updated by age. In general, this article makes a positive impression. After completion, it can be recommended for publication. But the reviewer also has another opinion: for example, in a psychological journal this article can be published in the presented form, since it is useful for psychologists to know the algorithms of functioning of neural circuits.