Library
|
Your profile |
Software systems and computational methods
Reference:
Ishankhonov , A.Y., Pshychenko, D.V., Mozharovskii , E.A., Aluev , A.S. (2024). The Role of LLM in Next-Generation Integrated Development Environments. Software systems and computational methods, 4, 140–150. https://doi.org/10.7256/2454-0714.2024.4.72022
The Role of LLM in Next-Generation Integrated Development Environments
DOI: 10.7256/2454-0714.2024.4.72022EDN: KMTOBGReceived: 18-10-2024Published: 05-01-2025Abstract: The role of Large Language Models (LLM) in new generation integrated development environments (IDEs). Tools such as GitHub Copilot, IntelliCode and Alice Code Assistant are explored in the context of their use in programming. The authors examine how LLMs enable the automation of key development tasks, including code autocompletion, error detection, refactoring, and code generation, which result in increased development efficiency and improved code quality. Special emphasis is placed on how LLMs affect developers' cognitive processes, such as problem-solving abilities, creativity, and professional skills. A review of existing integrated development environments that utilize large language models. LLM functionality for code autocompletion, fragment generation, error detection and correction was evaluated. Comparative methods were applied to evaluate the effectiveness of LLM compared to traditional development tools. Special attention was paid to analyzing the cognitive load caused by the use of LLMs and assessing their impact on the creative process. The novelty of the research consists in the complex analysis of LLM application in modern IDEs, as well as in revealing their potential for increasing developers' productivity and improving the quality of program code. It is concluded that LLM integration into IDEs allows not only speeding up the process of code creation, but also considerably increasing its quality due to intellectual support and automation of the routine tasks. However, while the benefits of integrating LLMs into IDEs are clear, limitations related to cognitive load, ethical issues, data security, and the need to maintain a balance between automation and development of programmers' skills are also identified. Keywords: Large Language Models, Integrated Development Environments, programming automation, code improvement, artificial intelligence, code completion, software systems, machine learning, development process optimization, data analysisThis article is automatically translated. Introduction Large Language Models (LLM) based on the transformer architecture demonstrate significant progress in the field of artificial intelligence (AI) and natural language processing. Originally developed to solve problems related to text processing, such as machine translation, text generation, and question answering, they quickly found wide application in various industries, including programming and software development. In modern Integrated Development Environments (IDE) LLMs are an important tool that can significantly automate processes, increase development efficiency, and improve code quality. LLM integration into the IDE provides the ability to automate many tasks, including code auto-completion, error detection and correction, and software fragment generation. These features allow developers to reduce the time spent on routine operations, reduce errors, and focus on solving more complex and creative tasks. In addition, LLMs contribute to intellectual support by providing recommendations for optimizing and improving code, which can significantly improve productivity and quality of work. However, the introduction of LLM into software development processes raises a number of questions and challenges. One of the key aspects is the influence of such models on the cognitive processes of developers, their ability to learn and improve themselves, as well as on professional competencies. There is a need to assess how constant interaction with LLM affects the workload of programmers, their motivation and ability to think creatively. The ethical aspects of LLM application in programming, including issues of data security, intellectual property protection, and possible risks associated with over-reliance on AI technologies, are particularly relevant. The purpose of this article is to analyze the role of LLM in a new generation IDE. The focus is on the cognitive and social aspects of developers' interaction with LLM, as well as on assessing their impact on programming effectiveness, changing working methods, and teamwork. The main part. The evolution of LLMs and their application in IDE The first LLMs, created in the 1990s, were based on statistical approaches such as n-grams and hidden Markov models. N-grams are sequences of n elements (words, symbols, or other units) that occur in text or data [1]. In the context of natural language processing (NLP), n-grams are used to model text and predict the next elements based on the previous ones. For example, for the sentence "Machine learning helps to solve problems" there are bigrams (sequences of two elements): "machine learning", "learning helps", "helps to solve", "solve problems". Hidden Markov models are based on the assumption that the probability of each element of a code or text appearing depends only on several previous elements, and not on the entire sequence [2]. Such models calculate the probabilities of words in a text or the next step when writing code based on observations from training data, where the frequency of occurrence of elements is estimated (Fig.1). Figure 1. Diagram of the Markov model Despite their popularity and widespread use in the early stages of the development of intelligent IDE tools, technologies such as n-grams and Markov models have significant limitations. These methods are based on the analysis of the local context, which does not allow taking into account the long-term dependencies between the elements of the code or text. In an IDE, this means that such models may miss important connections between variables, functions, or code blocks, especially in large projects with long dependencies. Ignoring deeper semantic connections limits the accuracy of autocompletion, error detection, and code analysis, since Markov models and n-grams cannot effectively account for the global structure of a program. A significant stage in the development of LLM was the emergence of Recurrent N eural N etwork (RNN) in the early 2000s [3]. These models are able to preserve the context of previous elements in the code or text over longer sequences, which allows the IDE to better understand the structure and logic of programs (Fig. 2). Figure 2. RNN circuit However, even more advanced RNN variants, such as LSTM (The Long Short-Term Memory) and GRU (Gated Recurrent Unit), face the problem of processing long codes, since they experience a gradual decrease in the amount of important information as the length of the sequences increases. In 2017, the transformer architecture appeared, which changed the approach to language modeling. Transformers use the attention mechanism, which allows models to focus on significant parts of the text, regardless of where they are in the sequence (Fig. 3). Figure 3. Transformer architecture diagram In the IDE, the encoder block plays an important role in converting source code or text data into intermediate representations, which are then transmitted through a multi-layered attention architecture. This allows the IDE to analyze the relationships between different code elements, identifying key structures and dependencies, regardless of their location in the codebase. The attention mechanism in the encoder provides interaction between any elements of the sequence, be they variables, functions, or code blocks, which makes it possible to effectively handle long-term dependencies, such as the relationship between the definition of a variable and its use in other parts of the program. This approach significantly improves the performance of the IDE compared to previous models such as RNN and LSTM, which had difficulty taking into account remote dependencies in the code [4]. This innovation was an important step in the development of tools for auto-completion, error analysis, and code refactoring. The decoder block uses intermediate representations created by the encoder to generate a new code sequence, for example, to auto-complete a function, generate tests, or predict the next development step. Transformer-based models such as BERT (Bidirectional Encoder Representations from Transformers) and GPT (Generative Pre-trained Transformer) are widely used in various fields, including programming-related tasks. Due to its bidirectional architecture, BERT processes text, taking into account the context both to the left and to the right of each word, which makes it especially useful in the tasks of code analysis and refactoring, auto-completion and error detection [5]. In the IDE, this allows you to more accurately understand the intentions of developers and offer optimal solutions to complete the code or improve its structure. GPT, with its unidirectional approach to generation, is especially useful in tasks related to auto-completion and code generation based on text descriptions [6]. In integrated development environments, GPT can be used to write new code snippets, predict the next step in programming, create templates, and auto-generate tests. The role of LLM in the transformation of modern IDEs The introduction of LLM into modern IDEs is a significant stage in the evolution of programming tools. LLMs have the ability to analyze code, predict the next fragments, and propose fixes, which significantly speeds up the development process and improves its quality. This not only facilitates routine tasks, but also provides developers with intellectual support at all stages of software creation (Table 1). Table 1. The influence of LLM on IDE [7, 8]
The use of LLM in the IDE significantly transforms the programming process, offering new opportunities for automation and intelligent support. These technologies not only increase development efficiency by speeding up routine operations, but also affect the quality of the software being created, reducing the likelihood of errors in the early stages [9]. LLM implementation promotes more intuitive interaction with the code, allowing developers to focus on solving creative and complex tasks, while minimizing the time spent on finding and correcting errors. Successful LLM integration requires consideration of possible limitations, such as dependence on the quality of training data and the need for control by developers to prevent the introduction of inefficient or unsafe solutions. It is also important to consider the ethical aspects of using models, especially regarding intellectual property and data privacy. Cognitive interaction of developers with LLM The integration of LLM into software development processes has a significant impact on the cognitive interaction of developers with the code and development tools. One of the key aspects is to reduce the cognitive burden on the developer. By automating routine tasks such as code completion and syntax error correction, LLMs allow developers to focus on more complex and creative aspects of programming. In addition, interaction with LLM changes the decision-making process. The models provide a variety of possible solutions to problems, which expands the developer's choice and can speed up the process of finding the right solution. This helps to reduce the amount of effort involved in information retrieval and makes the decision-making process more intuitive and efficient. An important aspect of cognitive interaction with LLM is to support the creative process. LLMs are able to offer non-standard solutions and help developers go beyond the usual patterns. Another important aspect is the adaptation of LLM to the developer's personal programming style. As they use the model, LLMs can memorize the style and preferences of a particular employee, offering solutions that match their habits and practices. This allows you to reduce the time required to complete tasks and make interaction with the model more intuitive. LLM application in IDE: analysis of modern solutions in large companies International companies are actively implementing LLM in the IDE to automate programming processes, improve code quality, and increase developer productivity. One of the most famous tools is GitHub Copilot, developed by Microsoft based on the OpenAI Codex model [10]. It is embedded in popular IDEs such as Visual Studio Code, and helps developers by auto-completing code, generating entire program fragments based on text descriptions and suggestions for code improvements. GitHub Copilot is trained on billions of lines of code from open repositories, which allows it to provide contextually accurate hints for many programming languages. This tool is widely used in American companies to speed up development and reduce the burden on programmers. For example, it increased the development speed of the Duolingo language learning platform by 25%, and the average code review execution time decreased by 67%. Another Microsoft product, IntelliCode, is an intelligent code completion system integrated into Visual Studio and Visual Studio Code. IntelliCode uses AI to analyze code, offering recommendations based on best practices and code used in real projects [11]. So, when writing a function in Python, IntelliCode can offer optimal use of standard libraries and improvements to increase code efficiency. Consider the following example: def process_data(data): result = [] for item in data: result.append(item * 2) return result data = [1, 2, 3, 4] print(process_data(data)), (1) When using IntelliCode, the system can offer optimized code using built-in Python methods such as map() and lambda, which speeds up program execution.: def process_data(data): return list(map(lambda x: x * 2, data)) data = [1, 2, 3, 4] print(process_data(data)), (2) Unlike GitHub Copilot, which offers broad recommendations, IntelliCode is more focused on improving code quality by following standards and analyzing previous developer actions. This tool is actively used by American and international companies to improve the efficiency of software development. Yandex, one of the leaders of the Russian IT market, has developed its own LLM-based solutions to support developers. As part of their ecosystem, LLMs have been integrated into the development environment through Alice Code Assistant, which helps programmers by auto-completing code, offering tests, and automatically correcting errors [12]. When writing a function in Python for data processing, Alice Code Assistant can offer auto-completion and automatic test generation for the function. Consider an example: def process_data(data): processed_data = [] for item in data: processed_data.append(item.lower()) return processed_data data = ["Hello", "World", "Alice"] print(process_data(data)), (3) Alice Code Assistant can suggest optimizing this feature by using more efficient built-in Python methods such as list comprehension, and also suggest generating tests for the feature.: # Optimized code with Alice Code Assistant recommendation def process_data(data): return [item.lower() for item in data] # Automatically generated test def test_process_data(): assert process_data(["Hello", "World", "Alice"]) == ["hello", "world", "alice"] assert process_data([]) == [] # Running the test test_process_data(), (4) This tool is used in various Yandex projects, including the development of search and advertising technologies. Alice Code Assistant provides support for various programming languages and is actively used within Yandex to accelerate the development of new services and applications. Problems and challenges of integrating LLM into next-generation integrated development environments The integration of LLM into the next generation IDE brings many benefits, such as automation of routine tasks, improved code quality, and increased productivity. However, this process is fraught with a number of problems, including dependence on data quality, insufficient adaptation of models to specific projects, intellectual property issues and ethical risks. To ensure successful LLM implementation, it is necessary to take these challenges into account and develop effective strategies to address them (Table 2). Table 2. LLM integration problems and ways to solve them [13, 14]
An analysis of the problems of implementing LLM in integrated development environments shows that the successful implementation of these technologies requires an integrated approach and careful monitoring. Although LLMs can significantly increase productivity and automate routine tasks, their application must be accompanied by adaptation to the specifics of the project and improvement of the data on which the models are trained. This will avoid possible biases and errors in the operation of the models. In addition, it is important to ensure a balance between automation and the development of developer skills in order to prevent their qualifications from declining. Ethical issues also require special attention. LLM must be used within the framework of strict standards related to security and data protection in order to prevent leakage of confidential information and infringement of intellectual property rights. Conclusion The use of LLM in the IDE significantly changes programming processes, contributing to the automation of routine tasks and improving overall productivity. Using LLM in the IDE allows developers not only to speed up code creation, but also to improve its quality through intelligent support and auto-completion. However, despite the obvious advantages, the use of such technologies requires solving a number of ethical and technical issues. In particular, it is important to control dependence on automation in order to maintain the professional skills of developers, as well as ensure compliance with data security and confidentiality standards. Successful LLM integration requires a thoughtful and responsible approach that provides a balance between automation and preserving the creative potential of programmers. References
1. Ivanov, K. N., & Zakharova, O. I. (2023). Natural language processing. Application of language models. Actual Problems of Informatics, Radiotechnics, and Communications, 155-156.
2. Korostin, O. (2024). Comparative analysis of NLP algorithms for optimizing communications in the maritime industry. Journal of Science. Lyon, 56, 19-22. 3. Qin, Z., Yang, S., & Zhong, Y. (2024). Hierarchically gated recurrent neural network for sequence modeling. Advances in Neural Information Processing Systems, 36. 4. Uzkikh, G. Yu. (2024). Application of transformers in natural language processing. Bulletin of Science, 4(8), 186-189. 5. Gweon, H., & Schonlau, M. (2024). Automated classification for open-ended questions with BERT. Journal of Survey Statistics and Methodology, 12(2), 493-504. 6. Liukko, V., Knappe, A., Anttila, T., & Hakala, J. (2024). ChatGPT as a Full-Stack Web Developer. In Generative AI for Effective Software Development (pp. 197-215). Cham: Springer Nature Switzerland. 7. Ponomarev, E. (2024). Optimizing android application performance: modern methods and practices. Sciences of Europe, 149, 62-64. 8. Makaryan, O. S. (2024). Software development using artificial intelligence. Bulletin of Master’s Degree, 23. 9. Bobunov, A. Yu. (2024). Comparison of testing automation practices in traditional banks and fintech companies. Journal of Science, 8. [Electronic resource]. Retrieved from http://www.dnevniknauki.ru/images/publications/2024/8/technics/Bobunov.pdf 10. Koyanagi, K., Wang, D., Noguchi, K., et al. (2024). Exploring the effect of multiple natural languages on code suggestion using GitHub Copilot. In Proceedings of the 2024 IEEE/ACM 21st International Conference on Mining Software Repositories (MSR) (pp. 481-486). 11. Oh, S., Lee, K., Park, S., & Kim, D. (2024). Poisoned ChatGPT finds work for idle hands: exploring developers’ coding practices with insecure suggestions from poisoned AI models. In Proceedings of the 2024 IEEE Symposium on Security and Privacy (SP) (pp. 1141-1159). 12. Zhikulina, K. P., Perfilieva, N. V., & Man, L. (2024). Digital strategy of language paradigm. Bulletin of Peoples' Friendship University of Russia. Series: Language Theory. Semiotics. Semantics, 15(2), 364-375. 13. Pekareva, V. V., & Frolovskaya, Yu. I. (2024). Semantic analysis of the term "information" for systematizing approaches and factors of information security in the context of digitalization. Agrarian and Land Law, 3(231), 89-92. doi:10.47643/1815-1329_2024_3_89 14. Verner, D. (2024). Integration of artificial intelligence in backend development. Annali d’Italia, 59, 88-91.
Peer Review
Peer reviewers' evaluations remain confidential and are not disclosed to the public. Only external reviews, authorized for publication by the article's author(s), are made public. Typically, these final reviews are conducted after the manuscript's revision. Adhering to our double-blind review policy, the reviewer's identity is kept confidential.
|