Knowledge Base, Intelligent Systems, Expert Systems, Decision Support Systems
Reference:
Alpatov A.N., Terloev E.Z., Matchin V.T.
Architecture of a three-dimensional convolutional neural network for detecting the fact of falsification of a video sequence
// Software systems and computational methods.
2024. № 3.
P. 1-11.
DOI: 10.7256/2454-0714.2024.3.70849 EDN: MNOVWB URL: https://en.nbpublish.com/library_read_article.php?id=70849
Abstract:
The article reflects the use of neural network technologies to determine the facts of falsification of the contents of video sequences. In the modern world, new technologies have become an integral part of the multimedia environment, but their proliferation has also created a new threat – the possibility of misuse to falsify the contents of video sequences. This leads to serious problems, such as the spread of fake news and misinformation of society. The scientific article examines this problem and determines the need to use neural networks to solve it. In comparison with other existing models and approaches, neural networks have high efficiency and accuracy in detecting video data falsification due to their ability to extract complex features and learn from large amounts of source data, which is especially important when reducing the resolution of the analyzed video sequence. Within the framework of this work, a mathematical model for identifying the falsification of audio and video sequences in video recordings is presented, as well as a model based on a three-dimensional convolutional neural network to determine the fact of falsification of a video sequence by analyzing the contents of individual frames. Within the framework of this work, it was proposed to consider the problem of identifying falsifications in video recordings as a joint solution to two problems: identification of falsification of audio and video sequences, and the resulting problem itself was transformed into a classical classification problem. Any video recording can be assigned to one of the four groups described in the work. Only the videos belonging to the first group are considered authentic, and all the others are fabricated. To increase the flexibility of the model, probabilistic classifiers have been added, which allows to take into account the degree of confidence in the predictions. The peculiarity of the resulting solution is the ability to adjust the threshold values, which allows to adapt the model to different levels of rigor depending on the task. The architecture of a three-dimensional convolutional neural network, including a preprocessing layer and a neural network layer, is proposed to determine fabricated photoreceads. The resulting model has a sufficient degree of accuracy in determining falsified video sequences, taking into account a significant decrease in frame resolution. Testing of the model on a training dataset showed the proportion of correct detection of video sequence falsification above 70%, which is noticeably better than guessing. Despite the sufficient accuracy, the model can be refined to more significantly increase the proportion of correct predictions.
Keywords:
batch normalization, anomaly detection, data preprocessing, audio falsification, deepfake detection, deepfakes, video falsification, convolutional neural networks, neural networks, machine learning
Parallel algorithms for numerical analysis
Reference:
Zelenskii A.A., Gribkov A.A.
Configuration of memory-oriented motion control system
// Software systems and computational methods.
2024. № 3.
P. 12-25.
DOI: 10.7256/2454-0714.2024.3.71073 EDN: TTQBBA URL: https://en.nbpublish.com/library_read_article.php?id=71073
Abstract:
The paper investigates the possibilities of configuring the control cycle, i.e., determining the distribution of time intervals required for the execution of individual control operations across execution threads, which ensures the realizability of control. The object of research in this article are control systems with object-oriented architecture, assuming a combined vertical-horizontal integration of functional blocks and modules that distribute all control tasks among themselves. This architecture is realized by means of an actor instrumental model using metaprogramming. Such control systems are best at reducing control cycle time by performing computational and other control operations in parallel. Several approaches to control cycle configuration are considered: without optimization, with combinatorial optimization in time, with combinatorial optimization in system resources. Also, achieving a near-optimal configuration can be achieved by using adaptive configuration. Research shows that the control system cycle configuration problem has several solutions. Practical obtaining a solution to the configuration problem in the case of combinatorial optimization is associated with significant difficulties due to the high algorithmic complexity of the problem and a large amount of required computations, rapidly growing as the number of operations at the stages of the control cycle. A possible means of overcoming these difficulties is the use of stochastic methods, which sharply reduce the required amount of computation. Also, a significant reduction in the complexity of the task of configuring the control system cycle can be achieved by using adaptive configuration, which has two variants of realization. The first variant is the real-time configuration of the control system cycle. The second variant is the determination of quasi-optimal configuration on the basis of multiple configurations with different initial data and subsequent comparison of the obtained results.
Keywords:
control operations, elements, loop, optimization, configuration, adaptive, sorting methods, execution threads, memory-oriented, control system
Methods, languages and forms of human-computer interaction
Reference:
Trofimova V.S., Karshieva P.K., Rakhmanenko I.A.
Fine-tuning neural networks for the features of a dataset in the speaker verification task using transfer learning
// Software systems and computational methods.
2024. № 3.
P. 26-36.
DOI: 10.7256/2454-0714.2024.3.71630 EDN: XHZCTS URL: https://en.nbpublish.com/library_read_article.php?id=71630
Abstract:
The subject of this study is neural networks, trained using transfer learning methods tailored to the specific characteristics of the dataset. The object of the study is machine learning methods used for solving speaker verification tasks. The aim of the research is to improve the efficiency of neural networks in the task of speaker verification. In this work, three datasets in different languages were prepared for the fine-tuning process: English, Russian, and Chinese. Additionally, an experimental study was conducted using modern pre-trained models ResNetSE34L and ResNetSE34V2, aimed at enhancing the efficiency of neural networks in text-independent speaker verification. The research methodology includes assessing the effectiveness of fine-tuning neural networks to the characteristics of the dataset in the speaker verification task, based on the equal error rate (EER) of Type I and Type II errors. A series of experiments were also conducted, during which parameters were varied, and layer freezing techniques were applied. The maximum reduction in the equal error rate (EER) when using the English dataset was achieved by adjusting the number of epochs and the learning rate, reducing the error by 50%. Similar parameter adjustments with the Russian dataset reduced the error by 63.64%. When fine-tuning with the Chinese dataset, the lowest error rate was achieved in the experiment that involved freezing the fully connected layer, modifying the learning rate, and changing the optimizer—resulting in a 16.04% error reduction. The obtained results can be used in the design and development of speaker verification systems and for educational purposes. It was also concluded that transfer learning is effective for fine-tuning neural networks to the specific characteristics of a dataset, as a significant reduction in EER was achieved in the majority of experiments, indicating improved speaker recognition accuracy.
Keywords:
deep learning, neural networks, speech processing, feature extraction, speaker recognition, speaker verification, dataset, fine-tuning, transfer learning, pattern recognition