Translate this page:
Please select your language to translate the article


You can just close the window to don't translate
Library
Your profile

Back to contents

Software systems and computational methods
Reference:

The Role of LLM in Next-Generation Integrated Development Environments

Ishankhonov Azizkhon YUnushon

ORCID: 0009-0009-8934-6289

Graduate student; Department of Non-Ferrous Metals Metallurgy; National Research Technological University 'MISIS'

119049, Russia, Moscow, Leninsky ave., 4, p. 1

m180119@edu.misis.ru
Pshychenko Dmitrii Viktorovich

ORCID: 0009-0006-8866-8057

independent researcher

Shabolovka str., 26-28, Moscow, 119049, Russia

dmitry.pshychenko@rambler.ru
Mozharovskii Evgenii Aleksandrovich

ORCID: 0009-0005-9957-1632

independent researcher

119991, Russia, Moscow, Leninskie gory str., 1

mozharovsky_ea@rambler.ru
Aluev Andrei Sergeevich

ORCID: 0009-0001-6737-7545

Master's degree; Ural Federal University

620062, Russia, Yekaterinburg, Mira str., 19

aluev_andrei@rambler.ru

DOI:

10.7256/2454-0714.2024.4.72022

EDN:

KMTOBG

Received:

18-10-2024


Published:

05-01-2025


Abstract: The role of Large Language Models (LLM) in new generation integrated development environments (IDEs). Tools such as GitHub Copilot, IntelliCode and Alice Code Assistant are explored in the context of their use in programming. The authors examine how LLMs enable the automation of key development tasks, including code autocompletion, error detection, refactoring, and code generation, which result in increased development efficiency and improved code quality. Special emphasis is placed on how LLMs affect developers' cognitive processes, such as problem-solving abilities, creativity, and professional skills. A review of existing integrated development environments that utilize large language models. LLM functionality for code autocompletion, fragment generation, error detection and correction was evaluated. Comparative methods were applied to evaluate the effectiveness of LLM compared to traditional development tools. Special attention was paid to analyzing the cognitive load caused by the use of LLMs and assessing their impact on the creative process. The novelty of the research consists in the complex analysis of LLM application in modern IDEs, as well as in revealing their potential for increasing developers' productivity and improving the quality of program code. It is concluded that LLM integration into IDEs allows not only speeding up the process of code creation, but also considerably increasing its quality due to intellectual support and automation of the routine tasks. However, while the benefits of integrating LLMs into IDEs are clear, limitations related to cognitive load, ethical issues, data security, and the need to maintain a balance between automation and development of programmers' skills are also identified.


Keywords:

Large Language Models, Integrated Development Environments, programming automation, code improvement, artificial intelligence, code completion, software systems, machine learning, development process optimization, data analysis

This article is automatically translated.

Introduction

Large Language Models (LLM) based on the transformer architecture demonstrate significant progress in the field of artificial intelligence (AI) and natural language processing. Originally developed to solve problems related to text processing, such as machine translation, text generation, and question answering, they quickly found wide application in various industries, including programming and software development. In modern Integrated Development Environments (IDE) LLMs are an important tool that can significantly automate processes, increase development efficiency, and improve code quality.

LLM integration into the IDE provides the ability to automate many tasks, including code auto-completion, error detection and correction, and software fragment generation. These features allow developers to reduce the time spent on routine operations, reduce errors, and focus on solving more complex and creative tasks. In addition, LLMs contribute to intellectual support by providing recommendations for optimizing and improving code, which can significantly improve productivity and quality of work.

However, the introduction of LLM into software development processes raises a number of questions and challenges. One of the key aspects is the influence of such models on the cognitive processes of developers, their ability to learn and improve themselves, as well as on professional competencies. There is a need to assess how constant interaction with LLM affects the workload of programmers, their motivation and ability to think creatively. The ethical aspects of LLM application in programming, including issues of data security, intellectual property protection, and possible risks associated with over-reliance on AI technologies, are particularly relevant.

The purpose of this article is to analyze the role of LLM in a new generation IDE. The focus is on the cognitive and social aspects of developers' interaction with LLM, as well as on assessing their impact on programming effectiveness, changing working methods, and teamwork.

The main part. The evolution of LLMs and their application in IDE

The first LLMs, created in the 1990s, were based on statistical approaches such as n-grams and hidden Markov models. N-grams are sequences of n elements (words, symbols, or other units) that occur in text or data [1]. In the context of natural language processing (NLP), n-grams are used to model text and predict the next elements based on the previous ones. For example, for the sentence "Machine learning helps to solve problems" there are bigrams (sequences of two elements): "machine learning", "learning helps", "helps to solve", "solve problems".

Hidden Markov models are based on the assumption that the probability of each element of a code or text appearing depends only on several previous elements, and not on the entire sequence [2]. Such models calculate the probabilities of words in a text or the next step when writing code based on observations from training data, where the frequency of occurrence of elements is estimated (Fig.1).

Figure 1. Diagram of the Markov model

Despite their popularity and widespread use in the early stages of the development of intelligent IDE tools, technologies such as n-grams and Markov models have significant limitations. These methods are based on the analysis of the local context, which does not allow taking into account the long-term dependencies between the elements of the code or text. In an IDE, this means that such models may miss important connections between variables, functions, or code blocks, especially in large projects with long dependencies. Ignoring deeper semantic connections limits the accuracy of autocompletion, error detection, and code analysis, since Markov models and n-grams cannot effectively account for the global structure of a program.

A significant stage in the development of LLM was the emergence of Recurrent N eural N etwork (RNN) in the early 2000s [3]. These models are able to preserve the context of previous elements in the code or text over longer sequences, which allows the IDE to better understand the structure and logic of programs (Fig. 2).

Figure 2. RNN circuit

However, even more advanced RNN variants, such as LSTM (The Long Short-Term Memory) and GRU (Gated Recurrent Unit), face the problem of processing long codes, since they experience a gradual decrease in the amount of important information as the length of the sequences increases.

In 2017, the transformer architecture appeared, which changed the approach to language modeling. Transformers use the attention mechanism, which allows models to focus on significant parts of the text, regardless of where they are in the sequence (Fig. 3).

Figure 3. Transformer architecture diagram

In the IDE, the encoder block plays an important role in converting source code or text data into intermediate representations, which are then transmitted through a multi-layered attention architecture. This allows the IDE to analyze the relationships between different code elements, identifying key structures and dependencies, regardless of their location in the codebase. The attention mechanism in the encoder provides interaction between any elements of the sequence, be they variables, functions, or code blocks, which makes it possible to effectively handle long-term dependencies, such as the relationship between the definition of a variable and its use in other parts of the program.

This approach significantly improves the performance of the IDE compared to previous models such as RNN and LSTM, which had difficulty taking into account remote dependencies in the code [4]. This innovation was an important step in the development of tools for auto-completion, error analysis, and code refactoring. The decoder block uses intermediate representations created by the encoder to generate a new code sequence, for example, to auto-complete a function, generate tests, or predict the next development step.

Transformer-based models such as BERT (Bidirectional Encoder Representations from Transformers) and GPT (Generative Pre-trained Transformer) are widely used in various fields, including programming-related tasks. Due to its bidirectional architecture, BERT processes text, taking into account the context both to the left and to the right of each word, which makes it especially useful in the tasks of code analysis and refactoring, auto-completion and error detection [5]. In the IDE, this allows you to more accurately understand the intentions of developers and offer optimal solutions to complete the code or improve its structure. GPT, with its unidirectional approach to generation, is especially useful in tasks related to auto-completion and code generation based on text descriptions [6]. In integrated development environments, GPT can be used to write new code snippets, predict the next step in programming, create templates, and auto-generate tests.

The role of LLM in the transformation of modern IDEs

The introduction of LLM into modern IDEs is a significant stage in the evolution of programming tools. LLMs have the ability to analyze code, predict the next fragments, and propose fixes, which significantly speeds up the development process and improves its quality. This not only facilitates routine tasks, but also provides developers with intellectual support at all stages of software creation (Table 1).

Table 1. The influence of LLM on IDE [7, 8]

Application aspect

Description

Advantages for development

Examples of tools

Auto-completion of the code

The models predict the next piece of code based on the current context.

Speeding up code writing, reducing the number of syntax errors.

GitHub Copilot, IntelliCode

Real-time code analysis

Automatic error checking of the code; suggestions for optimization during writing.

Improving the structure and quality of the code, reducing the number of errors during the development phase.

DeepCode, CodeGuru

Error detection and syntax correction

The models identify logical and syntactic errors with suggestions for correcting them.

Improving the overall reliability of the code, reducing debugging time.

Tabnine, Kite

Forecasting and automating routine tasks

Automate repetitive operations such as template code generation or test creation.

Reducing the cognitive load of developers, speeding up the execution of standard tasks.

Codex, Replit

The use of LLM in the IDE significantly transforms the programming process, offering new opportunities for automation and intelligent support. These technologies not only increase development efficiency by speeding up routine operations, but also affect the quality of the software being created, reducing the likelihood of errors in the early stages [9]. LLM implementation promotes more intuitive interaction with the code, allowing developers to focus on solving creative and complex tasks, while minimizing the time spent on finding and correcting errors.

Successful LLM integration requires consideration of possible limitations, such as dependence on the quality of training data and the need for control by developers to prevent the introduction of inefficient or unsafe solutions. It is also important to consider the ethical aspects of using models, especially regarding intellectual property and data privacy.

Cognitive interaction of developers with LLM

The integration of LLM into software development processes has a significant impact on the cognitive interaction of developers with the code and development tools. One of the key aspects is to reduce the cognitive burden on the developer. By automating routine tasks such as code completion and syntax error correction, LLMs allow developers to focus on more complex and creative aspects of programming.

In addition, interaction with LLM changes the decision-making process. The models provide a variety of possible solutions to problems, which expands the developer's choice and can speed up the process of finding the right solution. This helps to reduce the amount of effort involved in information retrieval and makes the decision-making process more intuitive and efficient.

An important aspect of cognitive interaction with LLM is to support the creative process. LLMs are able to offer non-standard solutions and help developers go beyond the usual patterns. Another important aspect is the adaptation of LLM to the developer's personal programming style. As they use the model, LLMs can memorize the style and preferences of a particular employee, offering solutions that match their habits and practices. This allows you to reduce the time required to complete tasks and make interaction with the model more intuitive.

LLM application in IDE: analysis of modern solutions in large companies

International companies are actively implementing LLM in the IDE to automate programming processes, improve code quality, and increase developer productivity. One of the most famous tools is GitHub Copilot, developed by Microsoft based on the OpenAI Codex model [10]. It is embedded in popular IDEs such as Visual Studio Code, and helps developers by auto-completing code, generating entire program fragments based on text descriptions and suggestions for code improvements. GitHub Copilot is trained on billions of lines of code from open repositories, which allows it to provide contextually accurate hints for many programming languages. This tool is widely used in American companies to speed up development and reduce the burden on programmers. For example, it increased the development speed of the Duolingo language learning platform by 25%, and the average code review execution time decreased by 67%.

Another Microsoft product, IntelliCode, is an intelligent code completion system integrated into Visual Studio and Visual Studio Code. IntelliCode uses AI to analyze code, offering recommendations based on best practices and code used in real projects [11]. So, when writing a function in Python, IntelliCode can offer optimal use of standard libraries and improvements to increase code efficiency. Consider the following example:

def process_data(data):

result = []

for item in data:

result.append(item * 2)

return result

data = [1, 2, 3, 4]

print(process_data(data)), (1)

When using IntelliCode, the system can offer optimized code using built-in Python methods such as map() and lambda, which speeds up program execution.:

def process_data(data):

return list(map(lambda x: x * 2, data))

data = [1, 2, 3, 4]

print(process_data(data)), (2)

Unlike GitHub Copilot, which offers broad recommendations, IntelliCode is more focused on improving code quality by following standards and analyzing previous developer actions. This tool is actively used by American and international companies to improve the efficiency of software development.

Yandex, one of the leaders of the Russian IT market, has developed its own LLM-based solutions to support developers. As part of their ecosystem, LLMs have been integrated into the development environment through Alice Code Assistant, which helps programmers by auto-completing code, offering tests, and automatically correcting errors [12]. When writing a function in Python for data processing, Alice Code Assistant can offer auto-completion and automatic test generation for the function. Consider an example:

def process_data(data):

processed_data = []

for item in data:

processed_data.append(item.lower())

return processed_data

data = ["Hello", "World", "Alice"]

print(process_data(data)), (3)

Alice Code Assistant can suggest optimizing this feature by using more efficient built-in Python methods such as list comprehension, and also suggest generating tests for the feature.:

# Optimized code with Alice Code Assistant recommendation

def process_data(data):

return [item.lower() for item in data]

# Automatically generated test

def test_process_data():

assert process_data(["Hello", "World", "Alice"]) == ["hello", "world", "alice"]

assert process_data([]) == []

# Running the test

test_process_data(), (4)

This tool is used in various Yandex projects, including the development of search and advertising technologies. Alice Code Assistant provides support for various programming languages and is actively used within Yandex to accelerate the development of new services and applications.

Problems and challenges of integrating LLM into next-generation integrated development environments

The integration of LLM into the next generation IDE brings many benefits, such as automation of routine tasks, improved code quality, and increased productivity. However, this process is fraught with a number of problems, including dependence on data quality, insufficient adaptation of models to specific projects, intellectual property issues and ethical risks. To ensure successful LLM implementation, it is necessary to take these challenges into account and develop effective strategies to address them (Table 2).

Table 2. LLM integration problems and ways to solve them [13, 14]

Problem

Description

Solution methods

Data quality and bias

Models can be trained on data with errors or bias, which reduces the accuracy of recommendations.

Constant updating of training data, the use of specialized data for specific tasks.

Lack of context

LLMs may not always take into account the unique requirements of specific projects.

Development and implementation of models trained on data related to a specific area or project.

Intellectual property issues

Models may use copyrighted data or violate confidentiality.

The use of models based on closed corporate data and the introduction of strict access control mechanisms.

Dependence on technology

Excessive use of LLM can lead to a decrease in developer skills.

Limiting automation at key stages of development, stimulating learning and improving developer skills.

Ethical issues and safety

Using LLM may pose risks to data security.

Development and implementation of ethical standards and safety protocols when using LLM.

An analysis of the problems of implementing LLM in integrated development environments shows that the successful implementation of these technologies requires an integrated approach and careful monitoring. Although LLMs can significantly increase productivity and automate routine tasks, their application must be accompanied by adaptation to the specifics of the project and improvement of the data on which the models are trained. This will avoid possible biases and errors in the operation of the models. In addition, it is important to ensure a balance between automation and the development of developer skills in order to prevent their qualifications from declining.

Ethical issues also require special attention. LLM must be used within the framework of strict standards related to security and data protection in order to prevent leakage of confidential information and infringement of intellectual property rights.

Conclusion

The use of LLM in the IDE significantly changes programming processes, contributing to the automation of routine tasks and improving overall productivity. Using LLM in the IDE allows developers not only to speed up code creation, but also to improve its quality through intelligent support and auto-completion. However, despite the obvious advantages, the use of such technologies requires solving a number of ethical and technical issues. In particular, it is important to control dependence on automation in order to maintain the professional skills of developers, as well as ensure compliance with data security and confidentiality standards. Successful LLM integration requires a thoughtful and responsible approach that provides a balance between automation and preserving the creative potential of programmers.

References
1. Ivanov, K. N., & Zakharova, O. I. (2023). Natural language processing. Application of language models. Actual Problems of Informatics, Radiotechnics, and Communications, 155-156.
2. Korostin, O. (2024). Comparative analysis of NLP algorithms for optimizing communications in the maritime industry. Journal of Science. Lyon, 56, 19-22.
3. Qin, Z., Yang, S., & Zhong, Y. (2024). Hierarchically gated recurrent neural network for sequence modeling. Advances in Neural Information Processing Systems, 36.
4. Uzkikh, G. Yu. (2024). Application of transformers in natural language processing. Bulletin of Science, 4(8), 186-189.
5. Gweon, H., & Schonlau, M. (2024). Automated classification for open-ended questions with BERT. Journal of Survey Statistics and Methodology, 12(2), 493-504.
6. Liukko, V., Knappe, A., Anttila, T., & Hakala, J. (2024). ChatGPT as a Full-Stack Web Developer. In Generative AI for Effective Software Development (pp. 197-215). Cham: Springer Nature Switzerland.
7. Ponomarev, E. (2024). Optimizing android application performance: modern methods and practices. Sciences of Europe, 149, 62-64.
8. Makaryan, O. S. (2024). Software development using artificial intelligence. Bulletin of Master’s Degree, 23.
9. Bobunov, A. Yu. (2024). Comparison of testing automation practices in traditional banks and fintech companies. Journal of Science, 8. [Electronic resource]. Retrieved from http://www.dnevniknauki.ru/images/publications/2024/8/technics/Bobunov.pdf
10. Koyanagi, K., Wang, D., Noguchi, K., et al. (2024). Exploring the effect of multiple natural languages on code suggestion using GitHub Copilot. In Proceedings of the 2024 IEEE/ACM 21st International Conference on Mining Software Repositories (MSR) (pp. 481-486).
11. Oh, S., Lee, K., Park, S., & Kim, D. (2024). Poisoned ChatGPT finds work for idle hands: exploring developers’ coding practices with insecure suggestions from poisoned AI models. In Proceedings of the 2024 IEEE Symposium on Security and Privacy (SP) (pp. 1141-1159).
12. Zhikulina, K. P., Perfilieva, N. V., & Man, L. (2024). Digital strategy of language paradigm. Bulletin of Peoples' Friendship University of Russia. Series: Language Theory. Semiotics. Semantics, 15(2), 364-375.
13. Pekareva, V. V., & Frolovskaya, Yu. I. (2024). Semantic analysis of the term "information" for systematizing approaches and factors of information security in the context of digitalization. Agrarian and Land Law, 3(231), 89-92. doi:10.47643/1815-1329_2024_3_89
14. Verner, D. (2024). Integration of artificial intelligence in backend development. Annali d’Italia, 59, 88-91.

Peer Review

Peer reviewers' evaluations remain confidential and are not disclosed to the public. Only external reviews, authorized for publication by the article's author(s), are made public. Typically, these final reviews are conducted after the manuscript's revision. Adhering to our double-blind review policy, the reviewer's identity is kept confidential.
The list of publisher reviewers can be found here.

The subject of the research in the peer-reviewed publication is large language models (LLM), their role in the integrated development environments (IDE) of the new generation is revealed in the work. The research methodology is based on the study and generalization of scientific publications on the topic under consideration and the analysis of the application of large language models in the practical work of IT development teams. The authors attribute the relevance of the work to the fact that large language models based on the architecture of transformers demonstrate significant progress in the field of artificial intelligence and natural language processing, quickly found wide application in various industries, including programming and software development, are an important tool capable of significantly automating processes, increasing development efficiency and improving code quality in modern integrated development environments. The scientific novelty of the reviewed study consists in the presented results of the analysis of the role of large language models in modern integrated development environments, in the cognitive and social aspects of developers' interaction with large language models reflected by the authors, assessments of changes in working methods and interaction in teams, as well as improving programming efficiency. The following sections are highlighted in the text of the article: Introduction, Main part. The evolution of LLMs and their application in the IDE, The role of LLM in the transformation of modern IDEs, Cognitive interaction of developers with LLM, The use of LLM in the IDE: analysis of modern solutions in large companies, Problems and challenges of integrating LLM into integrated development environments of a new generation, Conclusion and Bibliography. The article analyzes how constant interaction with large language models affects the workload of programmers, their motivation and ability to think creatively, data security, intellectual property protection and the risks of excessive dependence on artificial intelligence technologies. The authors noted the importance of ensuring a balance between automation and the development of developer skills in order to prevent a decrease in their qualifications. The publication highlights the problems of integrating large language models: data quality and bias, lack of context, intellectual property issues, dependence on technology, ethical issues and security. The text of the article is illustrated with two tables and three figures, and contains several fragments of program codes. The bibliographic list includes 14 sources – scientific publications in Russian and foreign languages, to which the text contains address references, which confirms the existence of an appeal to opponents. The reviewed material corresponds to the direction of the journal "Software Systems and Computational Methods", reflects the results of the work carried out by the authors, and may be of interest to readers, since it contains interesting information about the role of large language models in integrated development environments of a new generation, as well as potential threats and risks of their mass dissemination. The article is recommended for publication.