Library
|
Your profile |
Software systems and computational methods
Reference:
Mamadaev I.M., Minitaeva A.M.
Performance optimization of machine learning-based image recognition algorithms for mobile devices based on the iOS operating system
// Software systems and computational methods.
2024. ¹ 2.
P. 86-98.
DOI: 10.7256/2454-0714.2024.2.70658 EDN: LDXKKC URL: https://en.nbpublish.com/library_read_article.php?id=70658
Performance optimization of machine learning-based image recognition algorithms for mobile devices based on the iOS operating system
DOI: 10.7256/2454-0714.2024.2.70658EDN: LDXKKCReceived: 05-05-2024Published: 13-06-2024Abstract: Today, mobile devices play an important role in everyone's daily life, and one of the key technologies leading to significant benefits for mobile applications is machine learning. Optimization of machine learning algorithms for mobile devices is an urgent and important task, it is aimed at developing and applying methods that will effectively use the limited computing resources of mobile devices. The paper discusses various ways to optimize image recognition algorithms on mobile devices, such as quantization and compression of models, optimization of initial calculations. In addition to ways to optimize the machine learning model itself, various libraries and tools for using this technology on mobile devices are also being considered. Each of the described methods has its advantages and disadvantages, and therefore, in the results of the work, it is proposed to use not only a combination of the described options, but also an additional method of parallelization of image processing processes. The article discusses examples of specific tools and frameworks available for optimizing machine learning performance on iOS, and conducted its own experiments to test the effectiveness of various optimization methods. An analysis of the results obtained and a comparison of the performance of the algorithms are also provided. The practical significance of this article is as follows: Improving the performance of machine learning algorithms on iOS mobile devices will lead to more efficient use of computing resources and increase system performance, which is very important in the context of limited computing power and energy resources of mobile devices. Optimization of machine learning performance on the iOS platform contributes to the development of faster and more responsive applications, which will also improve the user experience and allow developers to create new and innovative features and capabilities. Expanding the applicability of machine learning on iOS mobile devices opens up new opportunities for application development in various fields such as pattern recognition, natural language processing, data analysis, and others. Keywords: neural network, machine learning, mobile device, iOS, image recognition, optimization, OS Apple, efficiency, performance, parallelizationThis article is automatically translated. Introduction and relevance Today, mobile devices play an important role in everyone's life, as they provide a wide range of features and services, without which many can no longer imagine their daily lives. One of the key technologies leading to significant advantages of mobile applications is machine learning [1], it is already used in many leading applications on the market, and large IT companies compete with each other, trying to attract more customers to their side. However, for their effective use on mobile devices, it is necessary to solve a number of difficult problems. Optimization of machine learning algorithms for mobile devices is an urgent and important task and is aimed at developing and applying methods that will effectively use the limited computing resources of mobile devices, minimize power consumption and achieve high performance when performing complex machine learning tasks. Optimizations open up new opportunities for the development of applications such as smart assistants and voice assistants, real-time image and video processing, and automatic data classification. Along with the growing popularity and use of Apple's mobile devices [2], there is also a need for efficient operation of machine learning algorithms on limited computing resources, low memory and insufficient battery capacity due to the size. Analyzing the problems related to machine learning performance on mobile devices running the iOS operating system, the following aspects can be identified: - delays in the execution of algorithms due to their complexity, - reduced responsiveness of the user interface due to overloading of the computing power of the device, - increased energy consumption and, as a result, increased heat dissipation. These aspects have a negative impact on the user experience and set developers the task of ensuring high application performance. The purpose of this work is to research and optimize the performance of image recognition algorithms based on machine learning on iOS mobile devices. The main task is to study existing optimization methods and techniques; to analyze the performance of various machine learning algorithms; to assess the impact of various factors on performance. The article discusses examples of specific tools and frameworks available for optimizing machine learning performance on iOS, and conducted its own experiments to test the effectiveness of various optimization methods. It also provides an analysis of the results obtained and a comparison of the performance of the algorithms. The practical significance of this article is as follows: - Improving the performance of machine learning algorithms on iOS mobile devices will lead to more efficient use of computing resources and improved system performance, which is very important in the context of limited computing power and energy resources of mobile devices. - Optimizing machine learning performance on the iOS platform contributes to the development of faster and more responsive applications, which will also improve the user experience and allow developers to create new and innovative features and capabilities. - Expanding the applicability of machine learning on iOS mobile devices opens up new opportunities for application development in various fields such as pattern recognition, natural language processing, data analysis and others.
1 Overview of existing solutions On iOS mobile devices, all kinds of variations of machine learning algorithms are often used to solve various tasks [3]. Some of them include classification algorithms, regression, clustering, neural networks and deep learning [4]. Prominent examples of such models are classification algorithms, logistic regression and the support vector machine, which are widely used to solve problems of pattern recognition and data classification on mobile devices. These methods have relatively low complexity and scale well to work with large amounts of data [5]. Regression algorithms, which include linear regression and the least squares method, are used to predict numerical values based on raw data. Such algorithms are widely used in forecasting and data analysis tasks on mobile devices. Clustering, in turn, is a method of grouping similar objects based on their characteristics. Some clustering algorithms, such as k-means and DBSCAN, are used to process data on mobile devices and search for hidden structures. Neural networks and deep learning are also among the most popular machine learning algorithms today, as they can process complex data, images, texts, and at the same time achieve high accuracy in classification, recognition, and content generation tasks. Proper use of optimization techniques can significantly improve the efficiency of machine learning algorithms on mobile devices.
2 Quantification of models One of the key optimization methods is the quantization of models [6], it allows you to reduce the size of the model and reduce the requirements for computing resources by representing weights and activations with less accuracy. In other words, quantization is the process of reducing the accuracy of weights by rounding, reducing accuracy. A visual illustration of this process is shown in Figure 1. Figure 1 is an example of quantization of the weight of 1 neuron and a 4–fold decrease in bit depth The use of quantization allows you to speed up the calculation process and reduce memory usage, slightly affecting the accuracy of the model. One of the main advantages of quantization is a reduction in the size of the model, which in turn leads to a reduction in requirements both for device memory and directly to the acceleration of calculations. In addition, quantization allows the use of specialized hardware accelerators, such as the "Neural Engine" [2] in Apple chips, designed, among other things, to efficiently perform operations with low accuracy. However, quantization has a very big drawback - it can also lead to a loss of model accuracy, especially when using a decrease in the accuracy of the representation of weights and activations. However, this significant disadvantage is overlaid by the following advantage - the feature of this method is that it can be used both during model training and after, which allows similar operations to be performed after the application is delivered to users.
3 Compression of models Another optimization method is model compression, which reduces the size of the model by removing unnecessary or redundant parameters. One of the types of compression of models is "pruning" [7] (clipping or thinning). Graphs of the accuracy and performance of models depending on the percentage of thinning in this way are shown in Figure 2. Figure 2 – Graphs of accuracy and performance during pruning (thinning) of models The peculiarity of pruning, unlike quantization, is that this process is possible only in an already pre-trained model. Compressing models in this way also has its advantages, it allows you to reduce the size of the model, which simplifies deployment and speeds up loading on devices. It can also reduce memory and computing resource requirements. However, when compressing models, there is a risk of loss of information and model accuracy. Some compression methods may result in the removal of parameters or relationships, which may affect the performance and effectiveness of the model. Nevertheless, despite the existing disadvantages, this method is also considered in the article, since when used even with minimal thinning values, in combination with other optimization methods, it can give acceptable results in accuracy and performance.
4 Optimization of calculations Optimization of calculations is also an important aspect, it may include the use of more efficient algorithms, optimization of computational graphs, allocation of calculations to a graphics processor (GPU) or the use of a specialized hardware accelerator (for example, Tensor Processing Unit) [8]. The peculiarity of this chip is that it was specially designed to work with models and processing multidimensional data. A simplified diagram of the tensor processor from Nvidia is shown in Figure 3. Figure 3 – Tensor (TPU) from Nvidia Optimization of calculations can lead to a significant acceleration of machine learning algorithms. Using more efficient algorithms, optimizing computational graphs, and distributing computations to specialized hardware accelerators can significantly improve performance. However, these methods require a good understanding of algorithms and computational models, as well as experience in their implementation and optimization.
5 Selection of machine learning frameworks and tools In addition, there are frameworks and tools specifically designed to optimize machine learning performance on iOS. Some of them include the libraries "CoreML" [9], "Metal Performance Shaders", as well as the framework "Metal API" and others. Figure 4 shows the scheme of the CoreML framework, the essence of which is to transform from a conventional model for stationary computing devices into a special optimized format for mobile devices, which is processed directly by the library itself and supplied to the mobile application being developed. Figure 4 – "CoreML" operation diagram The Metal Performance Shaders framework contains a collection of highly optimized computing and graphics shaders designed for easy and efficient integration into a mobile application. These data-parallel primitives are specifically configured to maximize the unique hardware features of each family of graphics processing units (GPUs) in order to ensure optimal performance. Applications using the Metal Performance Shaders framework achieve excellent performance without the need to create and maintain manual shaders for each GPU family. "Metal Performance Shaders" can be used together with other existing resources of your application (such as MTLCommandBuffer, MTLTexture and MTLBuffer objects) and shaders [9]. The framework supports the following functionality: - applying high-performance filters to images and extracting statistical and histogram data from them, - implementation and launch of neural networks for machine learning learning and output, - solving systems of equations, factorization of matrices and multiplication of matrices and vectors [10][11], - Accelerate ray tracing with high-performance ray intersection and geometry testing. In turn, the Metal library is a low—level, low-cost software interface for hardware acceleration of 3D graphics and computing, developed by Apple and debuted in iOS 8. Metal combines functions similar to OpenGL and OpenCL in one package. It is designed to improve performance by providing low-level access to the hardware capabilities of the graphics processor (GPU) for applications on iOS, iPadOS, macOS and tvOS. It can be compared with low-level APIs on other platforms such as Vulkan and DirectX 12. Metal is object-oriented, which allows it to be used with programming languages such as Swift, Objective-C or C++17. According to Apple's promotional materials: MSL [Metal Shading Language] is a single language that allows for closer integration of graphics and computing programs [12]. There are also analogues of these libraries for devices based on the Android operating system, but they will not be considered due to the fact that the article focuses specifically on Apple devices and their chips with the prefix "A". These tools provide optimized functions and APIs that enable efficient use of the hardware capabilities of the devices. However, using these frameworks requires additional efforts to integrate existing models and algorithms, as well as to study their features and capabilities.
6 Conducting experiments on combining algorithms To conduct experiments on optimizing machine learning algorithms on iOS, let's review a methodology based on a systematic study of various parameters and settings of algorithms. The key stage of the experiments was to determine the optimal values of parameters such as "learning rate", "batch size" and the number of learning epochs. These parameters were chosen due to the fact that they have the greatest impact on the speed and quality of learning algorithms. Special attention was also paid to choosing the optimal network structure and optimization algorithm adapted specifically for the iOS platform, which made it possible to significantly improve the performance of machine learning algorithms on iOS devices. To obtain the most accurate result with the most efficient algorithms, the following experiments were carried out: - Each of the listed methods of neural network optimization [12] is considered in pairs – quantization [13], compression, the use of TPU and the framework used. Various combinations were tried in search of the best efficiency, some of the approximate combinations are shown in Table 1. - Due to the fact that quantization and compression quite critically reduce the accuracy of neural networks [14][15] – a separate measurement was carried out without using them – only the use of a TPU chip and two separate combinations with different frameworks = CoreML and Metal Table 1 – Approximate options for combining optimization methods
The result confirmed the hypothesis that the use of compression and quantization algorithms radically reduces the accuracy of the initial neural network – the accuracy of the machine learning algorithm has fallen several times, although the speed of operation has increased by an order of magnitude. At the same time, the second experiment gave good results – combining the use of the TPU chip and the CoreML and Metal frameworks gave only an increase in performance, without reducing accuracy – with only one feature: each of the frameworks must be used only to solve tasks suitable for them, namely the use of machine learning algorithms with CoreML and 2D/3D processing images and models using the Metal framework. During the experiments, another possible direction for optimization was also revealed – splitting the processing process into 2 components, for CPU and GPU with equivalence classes suitable for each other.
Conclusion The article presented the main ways to optimize machine learning algorithms, however, to achieve the best result, it is necessary to use a synthesis of several of the described approaches. After conducting experiments with the described optimization algorithms, an attempt was made to combine all available optimization methods to achieve the best indicator and solve the problem described earlier – the insufficient effectiveness of individual optimization methods [16][17]. The conducted research and necessary experiments have revealed that the combination of compression and quantization radically reduces the accuracy of the original neural network. Thus, in order to achieve optimization with acceptable accuracy losses, it is recommended to use only one of the ways to optimize the algorithm itself. Empirically, the recommendation of the developers of the toolkit was confirmed that it is necessary to combine the capabilities of a special chip with one of the frameworks [18]. Another result of the experiments was the identification of a new direction for optimization - splitting the input data processing process into equivalence classes so that processing takes place in parallel not only using GPU and TPU capacities, but also CPU. An approximate conditional partitioning scheme is shown in Figure 5. Despite the fact that the central processor is not designed to perform this kind of operation, with proper partitioning, it gave an increase in execution speed. Figure 5 – Splitting the processing object by equivalence classes As further areas of work, it is planned to investigate the application of the proposed method to mobile devices running the Android operating system [19], as well as to implement in practice the synthesis of several algorithms for optimizing machine learning models. For the practical implementation of such a solution, a separate study and comparison of the performance of various machine learning algorithms on iOS mobile devices will be conducted as part of the dissertation and scientific work. The main purpose of this article is to determine the effectiveness and compare the performance of various algorithms in order to identify the most suitable ones for use on iOS devices. To do this, standard datasets were used, such as "MNIST" for handwritten digit recognition and "ImageNet" [20] for image classification, in order to be able to compare the results with other studies. The experiments will take into account various factors that can affect the performance of algorithms, such as the size of the dataset, the complexity of the model and the selected parameters. Experiments were conducted with various settings of frameworks and libraries used with machine learning algorithms on mobile devices, and parameters to assess their impact on performance. References
1. Zhang Ya, Liu Ya, Chen T., & Geng U. "Mobile Deep Learning for Intelligent Mobile Applications: An Overview." IEEE Access, 8, 103, 586-103, 607.
2. Apple Developer Documentation [Electronic resource]. "Core ML – Performance optimization on devices." Retrieved from https://developer.apple.com/documentation/coreml/optimizing_for_on-device_performance 3. Rastegari, M., Ordones, V., Redmon, J., & Farhadi, A. "XNOR-Net: Classification of ImageNet images using binary convolutional neural networks.". Proceedings of the European Conference on Computer Vision (ECCV) (pp. 525-542). 4. Sikhotan, H., Mark, A., Riandari, F. & Rendell L. "Effective optimization algorithms for various machine learning tasks, including classification, regression and clustering.". IEEE Access, 1, 14-24. doi:10.35335/idea.v1i1.3. 5. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. "MobileNetV2: Inverted residual blocks and linear bottlenecks.". Proceedings of the conference on Computer Vision and Image Processing. IEEE (pp. 4510-4520). 6. Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, V., Veand, T. Et al. "MobileNets: Efficient convolutional neural networks for mobile computer vision applications.". IEEE 1704.04861. 7. Han, S., Mao, H., Dally, W. J. "Deep compression: Compression of deep neural networks using clipping, quantization of learning and Huffman coding.". IEEE 1510.00149. 8. Google TensorFlow Lite documentation. «TensorFlow». Retrieved from https://www.tensorflow.org/lite 9. Thakkar, M. "The beginning of machine learning in iOS: CoreML Framework.". IEEE Access. doi:10.1007/978-1-4842-4297-1 10. Minitaeva, A.M. (2022). Decision-making in conditions of interval assignment of preferences of decision makers. Proceedings of the conference "Information Technologies in Management" (ITU-2022) : 15th MULTI-CONFERENCE ON MANAGEMENT PROBLEMS, St. Petersburg, 04 06 October 2022. – St. Petersburg: Concern; Central Research Institute; Electropribor (pp. 197-200). 11. Minitaeva, A.M. (2023). A multi-model approach to forecasting nonlinear non-stationary processes in optimal control problems. Irreversible processes in nature and technology : Proceedings of the Twelfth All-Russian Conference. In 2 volumes, Moscow, January 31 – 03, 2023. – Moscow: Bauman Moscow State Technical University (National Research University), pp. 438-447. 12. Kochnev, A., "Conceptual foundations of the practical use of neural networks: problems and prospects". "Society and innovations". doi:10.47689/2181-1415-vol4-iss1-pp1-10 13. Kurbaria, M., Bengio, Y., David, J. P. "BinaryNet: Training deep neural networks with restrictions on weights and activations of +1 or -1.". IEEE:1602.02830. 14. Li, G., Wei, Gao, & Wuen, G. "Quantization techniques". doi:10.1007/978-981-97-1957-0_5 15. Samsiana, S., & Syamsul, A. "Machine learning algorithms using the vector quantization learning method". doi:10.1051/e3sconf/202450003010 16. Jeremy, A. Atayero, Samuel Adjani "Overview of machine learning on embedded and mobile devices: optimization and applications". doi:doi:10.3390/s21134412 17. Sandler, M., Howard, A., & LeKun, Y. "Mobilenetv3: A highly efficient scalable model of mobile computer vision.". Proceedings of the conference on Computer Vision and image Processing. IEEE/CVF (pp. 13840-13848). 18. Chen, B., Danda, R. & Yuan, Ch. "Towards the theft of deep neural networks on mobile devices". Security and privacy in communication networks (pp. 495-508). doi:10.1007/978-3-030-90022-9_27 19. Jarmuni, F., & Fawzi, A. "Launching neural networks in Android". University of Ottawa. Introduction to Deep Learning and Neural Networks with Python (cnh/247-280). doi:10.1016/B978-0-323-90933-4.00001-2 20. Bykov, K., & Muller, K. "The dangers of watermarked images in ImageNet". Artificial Intelligence. ECAI 2023 International Seminars (pp. 426-434). doi:10.1007/978-3-031-50396-2_24
First Peer Review
Peer reviewers' evaluations remain confidential and are not disclosed to the public. Only external reviews, authorized for publication by the article's author(s), are made public. Typically, these final reviews are conducted after the manuscript's revision. Adhering to our double-blind review policy, the reviewer's identity is kept confidential.
Second Peer Review
Peer reviewers' evaluations remain confidential and are not disclosed to the public. Only external reviews, authorized for publication by the article's author(s), are made public. Typically, these final reviews are conducted after the manuscript's revision. Adhering to our double-blind review policy, the reviewer's identity is kept confidential.
|