Reference:
Liu M., Shao Q., Xie G..
Multi-Agent Approach to Political Discourse Translation: From Large Language Models to MAGIC-PTF System
// Litera.
2024. № 11.
P. 28-46.
DOI: 10.7256/2409-8698.2024.11.72197 EDN: GFRZMO URL: https://en.nbpublish.com/library_read_article.php?id=72197
Abstract:
This research addresses the automated translation of Chinese political discourse into Russian utilizing Large Language Model (LLM) optimization technologies and a multi-agent approach. The study focuses on developing the MAGIC-PTF system, which implements multi-stage text processing through the interaction of four specialized agents. The system's key component is the Style Agent, which ensures stylistic uniformity and terminological accuracy based on a specifically trained LLM. The Translator Agent performs the primary translation work and is responsible for the final text formatting. The Editor Agent conducts multi-level verification and correction of translations, considering linguistic, semantic, and cultural aspects. The Reader Agent analyzes the text from the target audience's perspective, evaluating its reception by native Russian speakers. The methodology integrates LLM optimization technologies and a multi-agent approach, with experimental testing conducted on the fourth volume of "Xi Jinping: The Governance of China" and its official Russian translation. The study includes a comprehensive analysis of system effectiveness using the COMET metric and comparative testing with existing machine translation platforms. The research's scientific novelty lies in developing a methodology for applying LLMs to specialized translation tasks and creating an effective coordination mechanism for intelligent agents in the translation process. Experimental research demonstrated MAGIC-PTF's superiority over traditional machine translation systems in key parameters: terminological accuracy, stylistic consistency, and preservation of culture-specific elements in political discourse. The developed system opens new possibilities for automated translation of political discourse and can be adapted for translating other specialized text types, confirming its significance for modern translation technology development. Of particular value is the system's scalability and adaptability to various language pairs and discourse types, creating prospects for further development in automated specialized text translation. The research findings also contribute to advancing the theory and practice of LLM application in professional translation.
Keywords:
Chinese-Russian Translation, Machine Translation, Cross-cultural Communication, Specialized Translation, Automated Translation, Intelligent Agents, LLM Optimization Technologies, Large Language Models, Multi-Agent Approach, Political Discourse
Reference:
Zhikulina C.P., Kostromina V.V..
Computational creativity of neural network Midjourney in a polymodal space
// Litera.
2024. № 6.
P. 1-16.
DOI: 10.7256/2409-8698.2024.6.70890 EDN: COCFNP URL: https://en.nbpublish.com/library_read_article.php?id=70890
Abstract:
This article deals with the polymodal space in the field of computational creativity in neural networks. The object of research is a polymodal environment that integrates a series of heterogeneous codes to express a common idea, and the subject is the possibility of creating polymodal digital art using text and voice prompts in the generative network Midjourney. The aim of the study is to prove that computational creativity can be detected and described based on the results of iterations in the process of creating images, which in turn will allow us to talk about a complex polymodal system as a separate digital category of polymodality. We used the continuous sampling method when collecting linguistic units as they occur in the analysis process; contextual analysis for the systematic identification and description of the verbal and non-verbal contexts. It was necessary to conduct an experiment with the generative network Midjourney to identify patterns in the creation of a graphic space through text and voice data input, and then compare and contrast the results of iterations with the original image. The scientific novelty consists in the lack of research on the polymodal space in the context of neural networks and their generative ability. During the experiment, we obtained the following results: the term ‘polymodality’ in the context of the generative network Midjourney and its ‘digital art’ is due to the presence of three channels: verbal, visual and voice; tests have shown that the ability of the neural network to create images through prompt is at a high level, however, there are rough technical errors that do not allow users to fully approach the desired result when they generate an image; the summarization of the data allows us to talk about the presence of features of computational creativity in generative networks.
Keywords:
Midjourney, neural network, transformational creativity, computational creativity, artificial intelligence, polymodal space, polymodal text, iteration, prompt, summarization
Reference:
Golikov A.A., Akimov D.A., Danilova Y..
Optimization of traditional methods for determining the similarity of project names and purchases using large language models
// Litera.
2024. № 4.
P. 109-121.
DOI: 10.7256/2409-8698.2024.4.70455 EDN: FRZANS URL: https://en.nbpublish.com/library_read_article.php?id=70455
Abstract:
The subject of the study is the analysis and improvement of methods for determining the relevance of project names to the information content of purchases using large language models. The object of the study is a database containing the names of projects and purchases in the field of electric power industry, collected from open sources. The author examines in detail such aspects of the topic as the use of TF-IDF and cosine similarity metrics for primary data filtering, and also describes in detail the integration and evaluation of the effectiveness of large language models such as GigaChat, GPT-3.5, and GPT-4 in text data matching tasks. Special attention is paid to the methods of clarifying the similarity of names based on reflection introduced into the prompta of large language models, which makes it possible to increase the accuracy of data comparison. The study uses TF-IDF and cosine similarity methods for primary data analysis, as well as large GigaChat, GPT-3.5 and GPT-4 language models for detailed verification of the relevance of project names and purchases, including reflection in model prompta to improve the accuracy of results. The novelty of the research lies in the development of a combined approach to determining the relevance of project names and purchases, combining traditional methods of processing text information (TF-IDF, cosine similarity) with the capabilities of large language models. A special contribution of the author to the research of the topic is the proposed methodology for improving the accuracy of data comparison by clarifying the results of primary selection using GPT-3.5 and GPT-4 models with optimized prompta, including reflection. The main conclusions of the study are confirmation of the prospects of using the developed approach in the tasks of information support for procurement processes and project implementation, as well as the possibility of using the results obtained for the development of text data mining systems in various sectors of the economy. The study showed that the use of language models makes it possible to improve the value of the F2 measure to 0.65, which indicates a significant improvement in the quality of data comparison compared with basic methods.
Keywords:
business process optimization, projects and procurement, relevance determination, reflexion in prompts, textual data analysis, GPT-4, GigaChat, large language models, cosine similarity, TF-IDF
Reference:
Lemaev V.I., Lukashevich N.V..
Automatic classification of emotions in speech: methods and data
// Litera.
2024. № 4.
P. 159-173.
DOI: 10.7256/2409-8698.2024.4.70472 EDN: WOBSMN URL: https://en.nbpublish.com/library_read_article.php?id=70472
Abstract:
The subject of this study is the data and methods used in the task of automatic recognition of emotions in spoken speech. This task has gained great popularity recently, primarily due to the emergence of large datasets of labeled data and the development of machine learning models. The classification of speech utterances is usually based on 6 archetypal emotions: anger, fear, surprise, joy, disgust and sadness. Most modern classification methods are based on machine learning and transformer models using a self-learning approach, in particular, models such as Wav2vec 2.0, HuBERT and WavLM, which are considered in this paper. English and Russian datasets of emotional speech, in particular, the datasets Dusha and RESD, are analyzed as data. As a method, an experiment was conducted in the form of comparing the results of Wav2vec 2.0, HuBERT and WavLM models applied to the relatively recently collected Russian datasets of emotional speech Dusha and RESD. The main purpose of the work is to analyze the availability and applicability of available data and approaches to recognizing emotions in speech for the Russian language, for which relatively little research has been conducted up to this point. The best result was demonstrated by the WavLM model on the Dusha dataset - 0.8782 dataset according to the Accuracy metric. The WavLM model also received the best result on the RESD dataset, while preliminary training was conducted for it on the Dusha - 0.81 dataset using the Accuracy metric. High classification results, primarily due to the quality and size of the collected Dusha dataset, indicate the prospects for further development of this area for the Russian language.
Keywords:
WavLM, HuBERT, Wav2vec, transformers, machine learning, emotion recognition, speech recognition, natural language processing, Dusha, RESD
Reference:
Zhikulina C.P..
Alice's Tales: the transformation of the composition, fairy-tale formulas and contexts of the voice assistant in the skill "Let's come up with"
// Litera.
2024. № 2.
P. 45-64.
DOI: 10.7256/2409-8698.2024.2.69760 EDN: AQYOMS URL: https://en.nbpublish.com/library_read_article.php?id=69760
Abstract:
The subject of the study is a spontaneously generated text by the voice assistant Alice when it is creating a fairy tale together with the user, and the purpose of the study is the transformation of the structure, fairy-tale formulas and context in terms of the selection of linguistic elements and meanings using artificial intelligence technology. Particular attention has focused on the skill "Let's make it up", which became available to users in the spring of 2023. The collision and interaction of folklore canons with the realities of the 21st century give rise to an ambiguous reaction to the interactive opportunity to play the role of a storyteller together with a voice assistant. The main research method was a continuous sample, which was used to distribute the steps, stages and actions when it is creating a fairy-tale plot together with a voice assistant. In addition, methods such as comparative and contextual analyses were used to identify similarities and differences between traditional Russian fairy tales and a spontaneously generated fairy tale plot. To obtain the data and subsequent analysis of the components, a linguistic experiment with the voice assistant Alice from Yandex was conducted and described. The rapid development of neural network language models allows us to talk about the scientific novelty of the material under study, since this area is unexplored and is being modified too quickly. It is important to emphasize that to date, the texts of spontaneously generated fairy-tale, their structural division and the correspondence of fairy-tale formulas in them to folklore canons have not been studied. The main conclusion of the study is that the user's share in creating a fairy tale with the voice assistant Alice is greatly exaggerated.
Keywords:
spontaneous text generation, context, spacetime, folklore formula, composition, neural network, GPT, voice assistant Alice, artificial intelligence, communication
Reference:
Zhikulina C.P..
Siri and the skills of encoding personal meanings in the context of English speech etiquette
// Litera.
2023. № 12.
P. 338-351.
DOI: 10.25136/2409-8698.2023.12.69345 EDN: KZVBFU URL: https://en.nbpublish.com/library_read_article.php?id=69345
Abstract:
The subject of the study is the content of personal meanings of greeting questions in the context of English communication formulas of Siri. The object of the study is the ability of the voice assistant to simulate spontaneous dialogue with a person and the adaptation of artificial intelligence to natural speech. The purpose of the study is to identify the features and level of Siri's language skills in the process of communicating with users in English. Such aspects of the topic as the problem of understanding that exists in two types of communication are considered in detail: 1) between a person and a person; 2) between a machine and a person; the use of stable communication formulas by artificial intelligence as responses to the question «How are you?»; determining the level and speech-making potential in the responses of the voice assistant. The following methods were used in the research: descriptive, comparative, contextual, comparative method and linguistic experiment. The scientific novelty is that the problems related to encoding the personal meanings of the Siri voice assistant have never been studied in detail in philology and linguistics. Due to the prevalence use of voice systems in various spheres of social and public life, there is a need to analyze errors in speech and describe communication failures in dialogues between voice assistants and users. The main conclusions of the study are: 1) the machine is not able to generate answers based on the experience of past impressions; 2) deviations from the norms of English speech etiquette in Siri's responses are insignificant, but often lead to communicative failures; 3) the one-sided encoding of personal meaning was found in the responses: from the machine to the person, but not vice versa.
Keywords:
stable communication formulas, colloquial speech, encoding, English speech etiquette, dialogue, communication, personal meaning, AI, Siri, voice assistant
Reference:
Golikov A., Akimov D., Romanovskii M., Trashchenkov S..
Aspects of creating a corporate question-and-answer system using generative pre-trained language models
// Litera.
2023. № 12.
P. 190-205.
DOI: 10.25136/2409-8698.2023.12.69353 EDN: FSTHRW URL: https://en.nbpublish.com/library_read_article.php?id=69353
Abstract:
The article describes various ways to use generative pre-trained language models to build a corporate question-and-answer system. A significant limitation of the current generative pre-trained language models is the limit on the number of input tokens, which does not allow them to work "out of the box" with a large number of documents or with a large document. To overcome this limitation, the paper considers the indexing of documents with subsequent search query and response generation based on two of the most popular open source solutions at the moment – the Haystack and LlamaIndex frameworks. It has been shown that using the open source Haystack framework with the best settings allows you to get more accurate answers when building a corporate question-and-answer system compared to the open source LlamaIndex framework, however, requires the use of an average of several more tokens. The article used a comparative analysis to evaluate the effectiveness of using generative pre-trained language models in corporate question-and-answer systems using the Haystack and Llamaindex frameworks. The evaluation of the obtained results was carried out using the EM (exact match) metric. The main conclusions of the conducted research on the creation of question-answer systems using generative pre-trained language models are: 1. Using hierarchical indexing is currently extremely expensive in terms of the number of tokens used (about 160,000 tokens for hierarchical indexing versus 30,000 tokens on average for sequential indexing), since the response is generated by sequentially processing parent and child nodes. 2. Processing information using the Haystack framework with the best settings allows you to get somewhat more accurate answers than using the LlamaIndex framework (0.7 vs. 0.67 with the best settings). 3. Using the Haystack framework is more invariant with respect to the accuracy of responses in terms of the number of tokens in the chunk. 4. On average, using the Haystack framework is more expensive in terms of the number of tokens (about 4 times) than the LlamaIndex framework. 5. The "create and refine" and "tree summarize" response generation modes for the LlamaIndex framework are approximately the same in terms of the accuracy of the responses received, however, more tokens are required for the "tree summarize" mode.
Keywords:
token, exact match, chunk, LlamaIndex, Haystack, QA-system, indexing, information retrieval system, generative anguage models, retriever
Reference:
Maikova T..
On the Concept of Translation Unit in a Machine Translation Framework
// Litera.
2023. № 12.
P. 352-360.
DOI: 10.25136/2409-8698.2023.12.69470 EDN: LAWSMV URL: https://en.nbpublish.com/library_read_article.php?id=69470
Abstract:
The article looks at the question whether the concept of translation unit might apply to the sphere of machine translation and whether the size of the unit influences the quality of translation. While modern machine translation systems offer an acceptable level of quality, a number of problems mainly related to the structural organization of the text remain unresolved, hence the question posed in the paper. The article offers a review of modern readings of the concept and pays special attention to the question whether the scope of the term changes depending on whether the object of research is the target text or the translation process. The paper also provides a quick look on the research methods for both text-oriented and process-oriented approaches, such as comparative analysis of language pairs and Think Aloud Protocol. Based on a review of existing machine translation models, each of them is analyzed to answer the question whether a unit of translation can be defined for a given system and what its size is. It is concluded that a unit of translation can be viewed as either a unit of analysis or a unit of processing with respect to text-oriented and process-oriented perspectives on to the study of translation. The unit of translation has a dynamic character and influences the quality of the target text. In machine translation, the unit of translation as a unit of analysis is not applicable for systems based on probabilistic non-linguistic methods. For rule-based machine translation systems, both readings of the unit of translation concept are applicable, but hardly go beyond a single sentence. Accordingly, at least one type of translation problem – intra-textual relations resolutions – remains largely unaddressed in the present state of affairs in machine translation.
Keywords:
Neural Machine Translation, Statistical Machine Translation, Rule-Based Translation, comparative analysis, Think-Aloud-Protocol, unit of processing, unit of analysis, machine translation, translation unit, hybrid systems
Reference:
Zaripova D.A., Lukashevich N.V..
Automatic Generation of Semantically Annotated Collocation Corpus
// Litera.
2023. № 11.
P. 113-125.
DOI: 10.25136/2409-8698.2023.11.44007 EDN: QRBQOI URL: https://en.nbpublish.com/library_read_article.php?id=44007
Abstract:
Word Sense Disambiguation (WSD) is a crucial initial step in automatic semantic analysis. It involves selecting the correct sense of an ambiguous word in a given context, which can be challenging even for human annotators. Supervised machine learning models require large datasets with semantic annotation to be effective. However, manual sense labeling can be a costly, labor-intensive, and time-consuming task. Therefore, it is crucial to develop and test automatic and semi-automatic methods of semantic annotation. Information about semantically related words, such as synonyms, hypernyms, hyponyms, and collocations in which the word appears, can be used for these purposes. In this article, we describe our approach to generating a semantically annotated collocation corpus for the Russian language. Our goal was to create a resource that could be used to improve the accuracy of WSD models for Russian. This article outlines the process of generating a semantically annotated collocation corpus for Russian and the principles used to select collocations. To disambiguate words within collocations, semantically related words defined based on RuWordNet are utilized. The same thesaurus is also used as the source of sense inventories. The methods described in the paper yield an F1-score of 80% and help to add approximately 23% of collocations with at least one ambiguous word to the corpus. Automatically generated collocation corpuses with semantic annotation can simplify the preparation of datasets for developing and testing WSD models. These corpuses can also serve as a valuable source of information for knowledge-based WSD models.
Keywords:
Sense Inventory, Collocation Corpus, Automatic Corpus Generation, Semantic Annotation, Word Sense Disambiguation, Automatic Semantic Analysis, Natural Language Processing, Related Words, SyntagNet, Thesaurus