Translate this page:
Please select your language to translate the article


You can just close the window to don't translate
Library
Your profile

Back to contents

Software systems and computational methods
Reference:

Research and development of algorithms for the formation of an effective ensemble of convolutional neural networks for image classification

Bondarenko Valerii Aleksandrovich

Graduate student, Department of Information Technology and Mathematics, Sochi State University

354002, Russia, Krasnodar Territory, Sochi, Verkhnyaya Lysaya Gora str., 64

valeriybbond@mail.ru
Popov Dmitrii Ivanovich

Doctor of Technical Science

Professor, Department of Information Technology and Mathematics, Sochi State University

354002, Russia, Krasnodar Territory, Sochi, Politechnicheskaya str., 7

damitry@mail.ru

DOI:

10.7256/2454-0714.2024.1.69919

EDN:

WZDHQO

Received:

20-02-2024


Published:

02-04-2024


Abstract: The object of the research is artificial neural networks (ANN) with convolutional architecture for image classification. The subject of the research is the study and development of algorithms for constructing ensembles of convolutional neural networks (SNS) in conditions of limited training sample. The aim of the study is to develop an algorithm for the formation of an effective model based on an ensemble of convolutional SNS using methods of averaging the results of each model, capable of avoiding overfitting in the process of improving the accuracy of the forecast and trained on a small amount of data, less than 10 thousand examples. As a basic network, an effective SNA architecture was developed as part of the ensemble, which showed good results as a single model. The article also examines methods for combining the results of ensemble models and provides recommendations for the formation of the SNA architecture.  The research methods used are the theory of neural networks, the theory of machine learning, artificial intelligence, methods of algorithmization and programming of machine learning models, a comparative analysis of models based on different algorithms using classical ensembling with simple averaging and combining the results of basic algorithms in conditions of limited sampling, taking into account weighted average. The field of application of the obtained algorithm and model is medical diagnostics in medical institutions, sanatoriums during primary diagnostic admission, using the example of a research task, the model is trained to classify dermatological diseases according to input photographs. The novelty of the study lies in the development of an effective algorithm and image classification model based on an ensemble of convolutional NS that exceed the prediction accuracy of basic classifiers, the process of retraining an ensemble of classifiers with deep architecture on a small sample volume is investigated, from which conclusions are drawn on the design of an optimal network architecture and the choice of methods for combining the results of several basic classifiers. As a result of the research, an algorithm has been developed for the formation of an ensemble of SNS based on an effective basic architecture and weighted average averaging of the results of each model for the classification task of image recognition in conditions of limited sampling.


Keywords:

artificial neural networks, convolutional architecture, ensembles of neural networks, methods of averaging the results, the task of classification, medical diagnostics, methods of neural network ensembling, weighted average, pre-treatment, balanced voting

This article is automatically translated.

Introduction

Ensembles of neural network models can be used to increase prediction accuracy and reduce errors [1]. An ensemble is a combination of several models working in parallel or sequentially and creating a combined prediction based on the results of each individual model. Ensembles of machine learning algorithms are a powerful technique that improves the accuracy and reliability of data classification. The main idea of the ensemble is to combine several basic classifiers in such a way that their individual errors are compensated and the overall performance of the system is improved [2]. There are several different approaches to ensembling, each of which has its advantages and disadvantages. As a rule, ensemble classifiers outperform individual basic classifiers in accuracy and reliability. The ensembling of machine learning algorithms is widely used in various fields such as pattern recognition, natural language processing, bioinformatics and financial analysis. However, ensembling has some disadvantages, for example, an increase in computational complexity, a potentially higher tendency to over-training, with insufficient sampling, and the difficulty of choosing an appropriate method of ensembling. Within the framework of the study, the problem of multi-class univariant classification of images is solved, model training refers to learning with a teacher and is carried out on the basis of a marked-up data set [3]. In the course of the study, we will compare the results of several models, the first classifier is based on a simple averaging of the results from several models of the SNA ensemble using deep pre-trained networks, the second classifier will be developed based on one effective network architecture, on the basis of which several ensemble models trained on independent samples using a weighted average combination of model results will be implemented.

Materials and methods

The main problem in using deep neural networks on a small amount of data is retraining, this is a phenomenon when the network predicts very well on the training sample and poorly on the test sample. The main indicators of network retraining are a very large difference in the quality indicator in the training and validation sample [3], also a sign of retraining is the invariance of the quality indicator from epoch to epoch Figure 1.

Figure 1. Stagnation of the accuracy indicator in training

We see that starting from the 30th epoch of learning, there is practically no change in the indicator, in this case there is a small change in the weight coefficients, which means that we are in the region of small gradients or have reached a local minimum [5]. Also, the reason for retraining is the heterogeneity in the distribution of data on the test and training sample, one class received a normal ratio of negative to positive results, the other class did not, and has unrepresentative data, as a result, during training, the network learns to recognize the first class, the second does not, when averaging the result across all classes, the quality of the final classifier deteriorates. To prevent overfitting, early training stops, ensembles, gradient descent optimization methods, normalization, the use of thinning dropout layers, as well as an increase in the volume of the training sample are used [6].

Analysis of the results of various experiments shows that the use of ensembles of SNA models can significantly increase the accuracy of forecasting. This is due to the fact that each model can detect certain characteristics or patterns in the data that may be overlooked by other models [7]. Combining the predictions of all models reduces the likelihood of false predictions and improves overall accuracy.

Moreover, the use of ensemble methods can help minimize the effect of retraining, a phenomenon in which the model "remembers" the data of the training set and loses the ability to generalize to new data [8]. Ensembling allows you to take into account various trends in the data and create a more stable forecasting model. The principle of the ensemble is that each neural network processes the input data independently and produces its own forecast. Then, by aggregating the results of all neural networks, the final forecast of the ensemble is obtained.

To date, there are many methods of model assemblage, depending on the type of task. The problem under study relates to a multiclass single-variant classification of images. Neural networks with convolutional architecture are used for pattern recognition.

Combining predictions from multiple classifiers is a widely used strategy to improve classification accuracy. The easiest way to combine forecasts is to simply take the arithmetic mean. However, this approach may be ineffective if individual classifiers have very different accuracy [9]. A more effective way to combine forecasts is to use a weighted average, where each classifier is assigned a weight according to its accuracy. The weight of a classifier can be determined based on its performance on a test dataset. Various optimization methods can be used to find optimal weights, such as random search, differential search, or Nelder-Meade optimization [10]. The optimal weights may depend on the specific data set and the classifiers used. Another important factor affecting the effectiveness of an ensemble of classifiers is the variety of classifiers. Classifiers that make similar mistakes will not complement each other and will not improve the accuracy of the ensemble. Therefore, it is advisable to use classifiers that use different classification methods or have different architectures [9].

Due to the fact that the main part of the algorithms for classification is based on the search for the extremum of a certain objective function, it is possible that the classification solution will be near the local extremum. To increase the probability of finding a globally optimal solution, you can use ensembles of models that combine the results of various basic classifiers based on different samples of source data. This approach allows you to search for a solution from different points, which increases the chances of obtaining an optimal result [11].

There are several ways to combine the result in an ensemble [12]:

-simple voting:

;

(1)

-balanced voting:

,

(2)

where at can be selected using linear methods,;

-mixed voting:

(3)

where gt(x) is a competence function that is able to determine where a particular algorithm is more appropriate.

In a simple vote, the class that best corresponds to the majority of votes of the basic classifiers is selected. However, weighted voting differs from simple voting in that it assigns weights to the votes of the basic classifiers, taking into account their accuracy of operation. It is possible to use functional dependence as a weight coefficient with a combination of expert opinions. In addition to various voting methods, mixing of the ensemble results can be performed through averaging, both with and without weight. For example, if the classification result is represented by the probabilities of objects belonging to classes, and not by class labels. When averaging without taking into account weight, the final value of the ensemble is calculated as a simple average of all the results of the basic classifiers. In the case of weighted averaging, the results obtained from the basic classifiers are multiplied by the corresponding weighting coefficients [11].

To optimize neural network ensembles, the boosting method (stochastic, adaptive, gradient) is often used, they are embedded in the methods of adaptive optimization of NS, such as Adagrad, rmsprop, adadelta, the method of adaptive moment estimation (Adam) [13, 14, 15].

Advantages of using ensembles of networks: they have greater flexibility, the ability to achieve better model quality indicators due to the variability of various algorithms, however, the use of ensembles is costly in terms of computing power [11].

The widespread use of neural network ensembles arises in situations where there is a huge amount of data, more than 100 thousand examples. In this case, a set of neural networks forms an ensemble that surpasses random selection in predictive accuracy for a given data density. These networks have a simple structure that allows you to learn quickly. Their theoretical basis is based on the principle of the central limit theorem in probability theory. This theorem states that a sequence of average values calculated on the basis of an independent set of n random variables with a large variance ? and obeying a normal distribution converges to a normal distribution. The ensemble of neural networks uses this principle to improve the quality of forecasts [16]:

(4)

where, is the conditional probability of a negative error of the i-th network at the t-th learning step with a positive error of the j-th network, is the conditional probability of a positive error of the j-th network with a negative i-th, P(N), P(O) are the probabilities of a positive and positive forecast.

By combining the predictions of several neural networks and averaging them, the standard deviation by ?n can be significantly reduced. In other words, if we summarize the forecasts of individual models under certain conditions, we can reduce the uncertainty associated with each of them. Thus, the accuracy of forecasting increases with an increase in the number of neural networks used and the average value of their output signals. It is important that the prediction errors of individual neural networks are statistically independent. To achieve this, various methods can be used, for example, training neural networks on different datasets and using a variety of models [16].

To achieve high-quality ensemble of neural networks, it is necessary to ensure their complete statistical independence. This means that each network must be trained on different and independent datasets. However, averaging the results from these networks may not lead to a significant decrease in the variance of the output signal. Therefore, in such cases, the method of weighted voting is often used. By assigning different weights to each network, an optimal result can be achieved. This approach allows us to take into account the contribution of each network and create a more accurate and reliable ensemble [16-20]:

,

(5)

where yi(x) is the output of the neural network, yr(y) is the probability of belonging to a class, y is the probability vector of belonging to classes, b is the weighting factor.

In this case, the variance will be determined by the formula:

(6)

where rij is the correlation coefficient between outputs from multiple neural networks.

There is a need to use multilevel approximation to reduce the variance of the resulting forecast based on correlating output signals of the i-th and j-th neural networks on real data sets. To do this, it is important to create a multi-level ensemble architecture with 2-3 levels, where each level will approximate the forecasts of the previous level. An additional condition is the minimum correlation of the lower-level output signals. The weights are set independently and using special methods, the optimal weights are searched for, showing the contribution of each model to the final forecast [16].

An open source of data on dermatological diseases, ISIC HAM10000, was chosen to conduct experiments on image classification [21]. It contains 10015 images of skin diseases. In addition to uploading images, a csv file with marked-up image data is uploaded, this is the patient's age, localization of education, gender, image code and target class is the diagnosis. This set was selected because there is a fully labeled sample, the data is representative and is the largest set in this area. The research was conducted in python with a hardware graphics accelerator, using machine learning and neural network development frameworks such as keras, tensorflow, and sklearn.

Results and discussion

To create an ensemble of networks, it was decided to train several convolutional neural networks with different parameters and architecture on different training samples. We counted our images and converted them into tensor vectors using the nampy library, while changing the dimension of the input image to reduce the load on computing power. The classes were very unbalanced, so we cleaned up the data of the largest class in terms of volume and augmented the data by the number of the maximum class, by rotating, changing colors, cropping images to achieve a balanced target classes. Then we counted the resulting dataframe with class markings and combined it with the resulting image tensors according to the image code.

Data preprocessing is also very important to improve the quality of the future classifier, therefore, during the analysis, the resulting data set was examined for anomalies, transformation of categorical variables, data transformation, and removal of duplicates [22, 23]. Using the train_test_split method, we divided the sample into a training and a test sample, where 80% of the training sample, 20% of the test sample. According to the data obtained, data normalization and standardization were carried out. Then we also divided the resulting training sample into training and validation in the amount of 90% training and 10% validation. We also divided the resulting training sample into three independent sets with a 20% probability of selection for each model.

To minimize the correlation between the predictions of individual models, we trained them on different independent samples. The first model was designed according to the transfer learning method [24], we loaded the architectures and weights of several pre-trained models on ImageNet, with different architectures in order to minimize error variance, which showed good results in pattern recognition, such as InceptionV3, InceptionResnetV2, VGG16 from the keras framework and tensorflow [25, 26, 27]. We set the dimension of the input image to 75x75 pixels due to limited computing power, after which we froze the layers of the pre-trained models and changed the last layer to the number of target classes, there are 7 of them in our task [28, 29]. Figure 2 shows the implementation scheme of transfer training.

Figure 2. Implementation of transfer training

We compiled the obtained models, we chose the Adam optimizer as an optimization, categorical_crossentropy as a loss function, and accuracy as a metric. We trained each model on 40 epochs and a batch size of 10 packages on different samples. Each model showed a different result, the InceptionV3 model showed the worst result in 53% accuracy, since it has the deepest architecture and a larger number of parameters, so training on a small amount of data shows poor quality. To implement combining the results by simple voting, we used the Average layer to average the output signal from each model. Figures 3, 4 show the implementation of the method of combining the result.

Figure 3. Implementation of simple averaging of pre-trained models

Figure 4. Ensemble architecture with simple averaging

We created a new model architecture, added a new Input layer for the input image, received upgraded output layers from each model and transferred them to the Average averaging layer. In order to prevent overfitting, a thinning dropout layer and an output fully connected layer with 7 target classes and a softmax activation function were added. The resulting architecture is compiled and trained similarly. At the 50th epoch of training, the accuracy on the training sample was 71%, the accuracy on the validation sample was 76%. Figure 5 shows the learning process. The accuracy in the test sample was 74% Figure 6.

Figure 5. The learning process of an ensemble with simple averaging

Figure 6. Assessment of the quality of the ensemble model on a test sample

However, we see that there have been no changes in the quality of training in recent epochs, which indicates that the model was retrained, since the training sample was less than 10 thousand examples, which is insufficient for training neural networks with a very deep architecture, therefore, developing your own architecture with a minimum number of convolution levels is more optimal for this task, and the number of there should be more than 3 classifiers in the ensemble and it should have a hierarchical architecture. We will use the weak architecture in conjunction with the algorithm for forming a weighted average ensemble [14]. In the course of experimental research, an algorithm was developed for the formation of an ensemble of SNS.

The first step was to design and evaluate the quality of an effective SNA architecture as a basic model. Figure 7 shows the architecture of an effective SNA.

Figure 7. Efficient SNA architecture

This model consists of three convolutional levels, each of which has two convolutional layers, a pooling layer and a thinning layer, hyperparameters of the network are experimentally selected, the number of filters in the layer increases by 2 times on each subsequent layer, the size of the core starting from 7x7 and the subsequent application of the core 3x3, activation functions, alternating layers, at the end, one fully connected layer, a layer of flatten normalization, dropout thinning, and one output fully connected dense layer are added to classify 7 target classes. In the course of the study, recommendations were developed for the formation of an effective neural network classifier architecture [30-33]:

-preprocessing of data is necessary, analysis of the balance of classes, in case of a small sample size, enrichment is necessary;

-the higher the dimension of the input image, the more convolution levels need to be used;

-it is also important to choose an optimizer when compiling the network, if there are many training epochs, you must use the SGD optimizer, otherwise Adam;

-there should be an alternation of convolution layers and a pooling layer, the number of convolution layers is set optimally from 1 to 2 depending on the complexity of the input image, for huge amounts of data, more than 100 thousand examples, you can use the number of convolutions greater than 3, the number of pooling layers from 0 to 1 at the same convolution level with the core size (2.2) and the MaxPooling2D function;

-in the last layers, it is necessary to reduce the number of fully connected layers, one fully connected layer and one output fully connected layer are optimal;

-to reduce the likelihood of retraining, it is necessary to use the dropout and BatchNormalization layers;

-an important parameter of the convolution layer is the size of the convolution filter core, which can take values (3,3), (5,5), (7,7), the more complex the input image, the higher the filter size of the convolutional layer must be set;

-the size of the input image input_shape on the input layer must be square;

-the padding and stride parameters of the convolutional layer must be integer and equal to the values of the previous layer, if it was not also convolutional;

-the number of filters in the convolutional layer is set optimally, the higher the size of the filter core, the higher the number is set, and with each subsequent convolution level increases by 2 times.

The study revealed important parameters of the neural network architecture that affect its effectiveness in the learning process: the size of the convolutional layer filter, the type of alternating network layers, as well as the number of convolution layers at one level, the number of filters, the filters parameter used in the convolutional layer, the number of convolution levels.

The next step was to train the model, it showed a fairly good result, almost 90% accuracy on the training sample and 78% on the test sample Figure 8. We applied this architecture to an ensemble with optimized weight search and weighted average.

Figure 8. Graph of changes in accuracy in the process of learning effective SNA

The SciPy library has a large number of optimization methods, for example, the differential_evolution method [34], it finds the global minimum of a multidimensional function, while not using gradient methods and is random in nature, a function is fed to the input of the function to evaluate a set of weights, the boundaries of the weights, the number of models in the ensemble, the test set X and y, the number of iterations By applying this method, it is possible to minimize the variance of the output signal of the ensemble models.

The model training function is implemented, in which the architecture is specified and a separate set of training data is generated for each model to ensure the independence of the ensemble models during training. The training of such models takes place on different samples, the architectures themselves should have different parameters, which will increase the adaptability to changing the input data of the ensemble. The next step is to implement the functions of evaluating the quality of the ensemble and the set of weights [35] Figure 9 shows the main functions of the algorithm implemented in python.

Figure 9. Implementation of the main functions of the weighted assessment method

To implement the algorithm for finding optimal weights necessary function optimization, this function was created loss_function, the evaluation function weights takes on input a vector of weights w, the number of models of the ensemble m, and the test X_test and y_test, the result of the function is the weight obtained is normalized using the second function, normalize, using nampy.norm normalizes the weights of the original weight is divided by the normalized value at the end of function, loss calculation of the percentage of true negative responses and is calculated as 1 minus the positive prediction accuracy, this function is called evaluate_ensemble, in which the number of models, the normalized weight value and test example, in this function calculates the accuracy of positive prediction using accuracy_score, which passed the actual class of the test set and the predicted class, to determine the predicted class of a function is invoked ensemble_predictions, where for each model the forecast on the test set using the method to predict, creates an array of probabilities of belonging to a class, then using the method tensordot formed scalar tensor product of the axes of the probabilities on the weight, that is, input method obtains a matrix of size 5x7, 5 models of the ensemble and 7 target classes, for each class of model has formed the probability that a method transposes the matrix, then horizontally turns 7 vertical 5, then each vector of the matrix multiplied by a given weight, and then sums the weighted signals from the models in columns, it turns out that 7 of the weighted probabilities of each model of the ensemble, from the array of probabilities of belonging based on weighting the maximum value is selected and taken its index using the method argmax. As a result, we get a predicted class taking into account the given weights.

The weight estimation function in our case is a loss function and determines the proportion of truly negative results that degrade quality and is defined as 1 minus the accuracy of a positive forecast. Figure 10 shows a block diagram of the algorithm for finding optimal weights.

Figure 10. Block diagram of the weighted ensemble formation algorithm

The number of models in the ensemble is set, each model is trained on its own independent dataset, using the evalute method, each model is evaluated and its accuracy is calculated. A list of weights is formed for each model, using the ensemble_predictions function, the accuracy of the ensemble is estimated taking into account the specified weights, the weights are initially set randomly, the boundaries of the weights are set from 0 to 1, the arguments of the weight optimization function are specified, this is the number of models, the test set X and y, after which the function of optimal search for weights differential_evolution is called, in which is passed the optimization function, the weight limit, arguments, and the number of search iterations is set. Using the key X, we get the optimal set of weights for each model of the form (0.01, 0.2, 0.11, 0.49, 0.12), next, we evaluate the quality of the ensemble according to the already optimal weights. We tested the results of the algorithm on a test dataset. As a result, the algorithm generates 5 convolutional networks and evaluates their accuracy Table 1.

Table 1. Results of weighted estimates of ensemble models

Table 2 shows a comparison of the obtained estimates of the accuracy of each model.

Table 2. Evaluation of the quality of the obtained models

The model generates an accuracy result taking into account weighted estimates, the accuracy score on optimal weights is 0.83, which is higher than that of the model based on an ensemble of deep pre-trained networks. The algorithm finds the optimal weights and shows the weight significance of each ensemble model, the obtained weights are transferred to the ensemble_predictions function, thereby predicting accuracy taking into account the optimal weights, based on the obtained weights, an accuracy estimate of the unifying classifier is formed, which reaches 83%, which is 5% higher than that of a single effective SNA architecture and 9% higher than an ensemble of deep pre-trained models with simple averaging of the results.

We tested the operation of the model on a test example, we loaded a test image of the dermatological disease of melanocytic nevus and dermatofibroma, reduced the dimension to (75, 75), since the model was trained on this dimension, then converted to a tensor representation using the nampy.asarray library to achieve 0 mean value and 1 variance, the resulting tensor vector was normalized by subtracting the mean and dividing by the standard deviation. Figure 11 shows the software implementation of Image Loading and processing.

Figure 11. Loading and processing the input image

Then we wrote a python functional for predicting the disease class and displaying a histogram of the probability distribution across the target classes. Figure 12 shows the software implementation for two test photos.

Figure 12. Software implementation of the prediction function and probability distribution

During the development, a dictionary with encoded target classes was created at the stage of preprocessing and transformation of the metadata dataset. The encoded classes are presented in the dictionary, using the predict model method, we predicted the probabilities of a photograph belonging to a particular class, for the first we see that the highest probability is 0.84 with an index of 4, for the second 0.94 with an index of 3, using the argmax method we obtained the index of the maximum probability value in the array, respectively equal to 4 and 3, according to the obtained index and an equal dictionary key, the description of the diagnosis was determined, for the first the model accurately predicted melanocytic nevus, for the second dermatofibroma with probabilities of 84% and 94%, respectively. It is important for the end user to interpret the results of the neural network as it came to this outcome, since it is often a "black box" model, so a histogram of the probability distribution will show why the model came to this result, however, more informative for interpreting the results of neural networks is the construction of GradCam heat maps showing which key areas of attention the images were taken conclusions. To implement the histogram, we made a forecast and got a list of indexes and probability values, created a dictionary where the key is the index and the value is the probability for each class, for convenience, we created a dataframe with the key and probability value. Using the map dataframe method, we obtained the names of classes by an index equal to the key of the descriptive dictionary. Using the matplotlib.pyplt library, we have constructed histograms of the probability distribution in Fig. 13.

Figure 13. Histograms of probability distribution

The histograms show which class is the most likely. The obtained research results and models will be used to develop an intelligent adaptive recommendation system that can be reconfigured and retrained to detect new cases. Figure 14 shows the interface of the diagnostic window based on photographs.

Figure 14. Interface of the diagnostic window based on photographs

Thus, we have developed an algorithm for the formation of an effective SNA architecture based on an ensemble with a weighted average combination of results, developed recommendations for the formation of an effective basic SNA architecture, conducted an ensemble study using pre-trained neural networks on ImageNet with a deep architecture based on simple averaging and using the transfer learning method on a small training sample, as a result, the model I retrained and showed a low result compared to previous models, this problem was solved by enriching the sample and evenly distributing the training and test samples for each class.

Conclusion

During the study, we implemented several SNA ensembles using algorithms for averaging and combining the results of models, the first algorithm based on simple arithmetic averaging of ensemble models turned out to be not optimal in application to the problem being solved, since it uses an NS with a very deep architecture, training was performed on a small sample, therefore, the network was retrained, resulting in a decrease in the quality of the final The accuracy of the classifier reached 74% in the test sample. The second model is implemented on the basis of an ensemble of models with an efficient NS architecture and with a weighted average result, it showed a better result than the previous model and achieved an accuracy of 83%, which is 9% higher. Thus, we experimentally tested the results of the constructed models, developed an algorithm for the formation of an effective SNA ensemble using averaging methods, studied the effect of network retraining on the final result, formed recommendations for the construction of an effective basic SNA classifier and built on the basis of algorithms of the architecture and SNA model, the results obtained will be used to develop an intelligent system for supporting medical decision-making solutions.

References
1.  Thoma, M. (2017). Analysis and optimization of convolutional neural network architectures.
2. Cruz, Y. J., Rivas, M., Quiza, R., Villalonga, A., Haber, R. E., & Beruvides, G. (2021). Ensemble of convolutional neural networks based on an evolutionary algorithm applied to an industrial welding process. Computers in Industry133, 103-530.
3.  Yang, S., Chen, L. F., Yan, T., Zhao, Y. H., & Fan, Y. J. (2017, May). An ensemble classification algorithm for convolutional neural network based on AdaBoost. In 2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS) (pp. 401-406). IEEE.
4.  Basili, V. R., Briand, L. C., & Melo, W. L. (1996). A validation of object-oriented design metrics as quality indicators. IEEE Transactions on software engineering22(10), 751-761.
5. Neural networks. Retraining-what is it and how to avoid it, the criteria for stopping learning. Retrieved from https://proproprogs.ru/neural_network/ pereobuchenie-chto-eto-i-kak-etogo-izbezhat-kriterii-ostanova-obucheniya
6. Voronetsky Yu. & O., Zhdanov & N. A. (2019). Methods of combating retraining of artificial neural networks. Scientific aspect, 2. Retrieved from https://na-journal.ru/2-2019-tehnicheskie-nauki/1703-metody-borby-s-pereobucheniem-iskusstvennyh-neironnyh-setei
7.  Li, C., Tao, Y., Ao, W., Yang, S., & Bai, Y. (2018). Improving forecasting accuracy of daily enterprise electricity consumption using a random forest based on ensemble empirical mode decomposition. Energy165, 1220-1227.
8.  Omisore, O. M., Akinyemi, T. O., Du, W., Duan, W., Orji, R., Do, T. N., & Wang, L. (2022). Weighting-based deep ensemble learning for recognition of interventionalists’ hand motions during robot-assisted intravascular catheterization. IEEE Transactions on Human-Machine Systems53(1), 215-227.
9The ensembling of neural network models using the Keras library. Retrieved from https://se.moevm.info/lib/exe/fetch.php/courses:artificial_ neural_ networks:pr_8.pdf
10The Nelder – Meade optimization method. An example of a Python implementation. Retrieved from https://habr.com/ru/articles/332092/
11. Klyueva, I. A. (2021). Methods and algorithms for ensemble and search for values of classifier parameters. (candidate dissertation). Ryazan State Radio Engineering University named after V.F. Utkin. Ryazan. Retrieved from https://dissov.pnzgu.ru/files/dissov .pnzgu.ru/2021/tech/klyueva/ dissertaciya_ klyuevoy _i_a_.pdf
12.  Mikryukov, A. A., Babash, A. V., & Sizov, V. A. (2019). Classifcation of events in information security systems based on neural networks. Open education23(1), 57-63.
13. Gizluk D. (2020). Adaptive optimization methods. Neural networks are simple, 7. Retrieved from https://www.mql5.com/ru/articles/8598#para21
14.  Mason, L., Baxter, J., Bartlett, P., & Frean, M. (1999). Boosting algorithms as gradient descent. Advances in neural information processing systems12.
15.  Zaheer, R., & Shaziya, H. (2019, January). A study of the optimization algorithms in deep learning. In 2019 third international conference on inventive systems and control (ICISC) (pp. 536-539). IEEE.
16. Staroverov, B. A., & Khamitov, R. N. (2023). Implementation of deep learning for forecasting using an ensemble of neural networks. Proceedings of Tula State University. Technical sciences, 4, 185-189.
17. Onan, A., Korukoğlu, S., & Bulut, H. (2016). A multiobjective weighted voting ensemble classifier based on differential evolution algorithm for text sentiment classification. Expert Systems with Applications62, 1-16.
18. Kim, H., Kim, H., Moon, H., & Ahn, H. (2011). A weight-adjusted voting algorithm for ensembles of classifiers. Journal of the Korean Statistical Society40(4), 437-449.
19. Yao, X., & Islam, M. M. (2008). Evolving artificial neural network ensembles. IEEE Computational Intelligence Magazine3(1), 31-42.
20. Anand, V., Gupta, S., Gupta, D., Gulzar, Y., Xin, Q., Juneja, S., ... & Shaikh, A. (2023). Weighted Average Ensemble Deep Learning Model for Stratification of Brain Tumor in MRI Images. Diagnostics13(7), 1320.
21The International Skin Imaging Collaboration. Retrieved from https://www.isic-archive.com
22.  Alexandropoulos, S. A. N., Kotsiantis, S. B., & Vrahatis, M. N. (2019). Data preprocessing in predictive data mining. The Knowledge Engineering Review34, e1.
23.  García, S., Luengo, J., & Herrera, F. (2015). Data preprocessing in data mining (Vol. 72, pp. 59-139). Cham, Switzerland: Springer International Publishing.
24.  Liang, G., & Zheng, L. (2020). A transfer learning method with deep residual network for pediatric pneumonia diagnosis. Computer methods and programs in biomedicine187, 104-964.
25InceptionV3. Retrieved from https://keras.io/api/ applications/inceptionv3/
26InceptionResnNetV2. Retrieved from https://keras.io/api/ applications/inceptionresnetv2/ 
27VGG16. Retrieved from https://keras.io/api/applications/vgg/#vgg16-function 
28. Shchetinin, E. Y. (2021). On some methods of image segmentation using convolutional neural networks. Information and telecommunication technologies and mathematical modeling of high-tech systems, 507-510.
29. Rosebrock, A. (2019). Change input shape dimensions for fine-tuning with Keras. AI & Computer Vision Programming. Retrieved from https://pyimagesearch.com/2019/06/24/change-input-shape-dimensions-for-fine-tuning-with-keras/
30. Kostin, K. A. et al. (2017) Adaptive pathology classifier for computer diagnostics of diseases using convolutional neural networks based on medical images and video data. (master dissertation). National Research Tomsk State University. Tomsk.
31. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems25.
32. Wang, J., Lin, J., & Wang, Z. (2017). Efficient hardware architectures for deep convolutional neural network. IEEE Transactions on Circuits and Systems I: Regular Papers65(6), 1941-1953.
33. Phung, V. H., & Rhee, E. J. (2019). A high-accuracy model average ensemble of convolutional neural networks for classification of cloud image patches on small datasets. Applied Sciences9(21), 4500.
34The differential evolution method. Retrieved from https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.differential_evolution.html
35. Scraper. (2018). How to develop a weighted average ensemble for deep learning neural networks. Retrieved from https://machinelearningmastery.ru/ weighted-average-ensemble-for-deep-learning-neural-networks/

Peer Review

Peer reviewers' evaluations remain confidential and are not disclosed to the public. Only external reviews, authorized for publication by the article's author(s), are made public. Typically, these final reviews are conducted after the manuscript's revision. Adhering to our double-blind review policy, the reviewer's identity is kept confidential.
The list of publisher reviewers can be found here.

The reviewed work is devoted to the research and development of algorithms for the formation of an effective ensemble of convolutional neural networks for image classification by solving the problem of multiclass single-variant image classification based on teaching a model with a teacher on a marked-up data set. The research methodology is based on the application of the cybernetic approach and the concept of artificial intelligence to pattern recognition using artificial neural networks, the creation of ensembles of machine learning models. Experiments on image classification were conducted using materials from an open source of data on dermatological diseases presented in csv format, the study was conducted in python with a hardware graphics accelerator and using machine learning and neural network development frameworks (keras, tensorflow, sklearn). The authors rightly attribute the relevance of the work to the fact that to increase the accuracy of prediction and reduce errors, ensembles of neural network models can be used, which are widely used in various fields, including image recognition in intelligent medical decision support systems. The scientific novelty of the reviewed study, according to the reviewer, consists in the development of algorithms for ensemble models of convolutional neural networks using averaging methods and recommendations on the application of the results obtained for the development of an intelligent medical decision support system. The following sections are highlighted in the text of the article: Introduction, Materials and methods, Results and discussion, Conclusion, Bibliography. In the introduction, the relevance of the topic is substantiated, the advantages and disadvantages of ensembling in machine learning are noted. Next, the problem of retraining in the use of deep neural networks on a small amount of data is outlined – a phenomenon when the network predicts very well on the training sample and poorly on the test one, as a result of which there is a large difference in the quality indicator on the training and validation sample. The publication discusses the principle of the ensemble, which consists in the fact that each neural network processes input data independently and issues its own forecast, and then, by aggregating the results of all neural networks, a final forecast is obtained in which individual errors are compensated and the overall performance of the system improves. Several ways of combining the result in an ensemble are given: simple voting, weighted voting, mixed voting; a comparison of the obtained estimates of the accuracy of the three models is presented. The article is illustrated with two tables, 14 figures, and contains 6 formulas. The bibliographic list includes 35 sources – scientific publications of domestic and foreign authors in Russian and English, as well as online resources on the topic under consideration, to which the text contains address links confirming the existence of an appeal to opponents. Among the reserves for improving the publication, the following can be noted. First, the authors are invited to consider the option of correcting the title of the article without the preposition "k": "Research and development of algorithms for the formation of ...". Secondly, it is better to place the table headers in accordance with the accepted rules – before the tables, not after them. The reviewed material corresponds to the direction of the journal "Software Systems and Computational Methods", reflects the progress and results of the work carried out by the authors to create an artificial intelligence system, will arouse the interest of readers, and therefore, after some revision in accordance with the expressed wishes, the article is recommended for publication.