Translate this page:
Please select your language to translate the article


You can just close the window to don't translate
Library
Your profile

Back to contents

Litera
Reference:

On the Concept of Translation Unit in a Machine Translation Framework

Maikova Tatyana

Senior Lecturer of the Department of Foreign Languages, Faculty of Humanities and Social Sciences at Peoples' Friendship University of Russia

117198, Russia, g. Moscow, ul. Miklukho-Maklaya, 10/2, of. 502

tatli@mail.ru
Other publications by this author
 

 

DOI:

10.25136/2409-8698.2023.12.69470

EDN:

LAWSMV

Received:

23-12-2023


Published:

30-12-2023


Abstract: The article looks at the question whether the concept of translation unit might apply to the sphere of machine translation and whether the size of the unit influences the quality of translation. While modern machine translation systems offer an acceptable level of quality, a number of problems mainly related to the structural organization of the text remain unresolved, hence the question posed in the paper. The article offers a review of modern readings of the concept and pays special attention to the question whether the scope of the term changes depending on whether the object of research is the target text or the translation process. The paper also provides a quick look on the research methods for both text-oriented and process-oriented approaches, such as comparative analysis of language pairs and Think Aloud Protocol. Based on a review of existing machine translation models, each of them is analyzed to answer the question whether a unit of translation can be defined for a given system and what its size is. It is concluded that a unit of translation can be viewed as either a unit of analysis or a unit of processing with respect to text-oriented and process-oriented perspectives on to the study of translation. The unit of translation has a dynamic character and influences the quality of the target text. In machine translation, the unit of translation as a unit of analysis is not applicable for systems based on probabilistic non-linguistic methods. For rule-based machine translation systems, both readings of the unit of translation concept are applicable, but hardly go beyond a single sentence. Accordingly, at least one type of translation problem – intra-textual relations resolutions – remains largely unaddressed in the present state of affairs in machine translation.


Keywords:

translation unit, machine translation, unit of analysis, unit of processing, Think-Aloud-Protocol, comparative analysis, Rule-Based Translation, Statistical Machine Translation, Neural Machine Translation, hybrid systems

This article is automatically translated.

It is difficult to deny that machine translation occupies a growing place in translation activities. This technology has transformed global communication, facilitating networking, collaboration and mutual understanding around the world. Today, machine translation systems demonstrate a fairly high level of quality, especially in the framework of informative translation for widely spoken languages. At the same time, a number of problems remain unresolved. Machine translation still faces problems when dealing with context, intertextual relationships, idiomatic expressions, and domain terminology. As part of the consideration of the first two problems, this work explores the applicability of the theoretical concept of translation unit to machine translation in the light of modern views on this concept, as well as the influence of the size of the unit being isolated on the level of translation quality.

There are many approaches to determining the unit of translation. In Russian translation studies, the works of V.N. Komissarov, L.S. Barkhudarov, I.I. Revzin, V.Yu. Rosenzweig, Yu.N. Marchuk, Ya.I. Retsker, I.S. Alekseeva and others are devoted to this issue. The term was proposed by J. Wine and J. Darbelne is used to denote a fragment of text considered as a single cognitive unit to establish equivalence.

L.S. Barkhudarov considers a translation unit as the smallest segment of the source text having an equivalent in the translation text, while any language units can act as translation units – from the smallest structural elements of the language system to the level of whole texts [2, p.3]. M. Shuttleworth and M. Covey [12, p. 192] define a unit translation as a term used to denote the level of language (phonemic; morphemic, lexical, syntactic) at which the source text is transcoded into the target language. Commenting on Barkhudarov's definition, Shuttleworth and Covey note that the size and linguistic type of a translation unit is determined by a specific translation task, while they can change over the course of a text or even a single sentence [ibid]. According to V. Koller [8, p. 100], the size of translation units may be determined by the degree of structural proximity between the source and the translating language. The researcher considers it quite likely that translation between unrelated languages will include larger units than translation between closely related languages.

I.S. Alekseeva suggests four ways to determine the units of translation [1, p. 149]. As part of the study of translation as a process, the unit of translation is considered to be "the minimum length of text that acts as an independent object of the translation process. Most often, this role is played by a proposal. The second approach is focused on the translation text, and "a minimum set of lexemes or grammemes is taken as a unit of translation, which can be aligned with the grammatical category of the translation" [1, p. 149]. The third approach is based on the analysis of the content plan, that is, the content of the text is divided into elementary meanings, considered as units of translation. Finally, the fourth way to isolate a translation unit uses the principle of semantic unity. "The unit of translation here is considered to be the minimum linguistic unit of the original text, perceived as a single whole from the point of view of semantics" [ibid]. In this case, the translation unit may have a complex structure, but its individual parts are untranslatable.

Let's take a closer look at the first two approaches. As follows from their definition, they are based on two different ideas about what translation is: text in English or the activity of creating text in English. Research on translation as a final text touches on such topics as the features of translated texts and the relationship between source and translated texts, and research on translation as a process is devoted to translation activities themselves, including the cognitive processes underlying the creation of translation. Accordingly, these two approaches rely on completely different research methods. Research focused on the text of translation, for the most part, uses a comparative analysis of language pairs identified using language corpora or appropriate search tools. The research methods of the translation process, in turn, are related to the methods of cognitive sciences, in particular psycholinguistics. At the same time, it should be noted that research focused on the final text and the translation process are not strictly differentiated. Some researchers analyze the final text of the translation, paying attention to the steps leading from the source text to the translation text, while others describe the process to some extent in terms of the relationship between the original and the translation [14].

Some foreign translation theorists believe that depending on whether the object of research is the final text or the translation process, the concept of a translation unit actually changes its content. Within the framework of end-text-oriented approaches, a translation unit can be understood as a unit of analysis, whereas in process-oriented research, it primarily means a unit of processing [15, p.254]. Within the framework of text-oriented approaches, the main subjects of research are the characteristic features of translated texts, such as the relationship between the source and translated texts, as well as the comparison of different translations of the same originals, both into one and into several languages. What is common to such studies is that researchers consider the translation text in comparison with the original one, that is, objects that existed before the observation began.

In process-oriented translation studies, the object of study is the activity of the translator. A. Hurtado Albir and F. Alves argues that "the translation unit should be considered both as a unit of understanding and as a unit of processing, that is, as a dynamic segment of the source text, independent of the specific size or shape to which the translator's attention is currently directed..." [5, p.238]. Thus, if the translator's understanding of the source text can be considered as a kind of analysis, then it follows that the aspect of analysis is present in such a representation of the unit of translation. However, a more important aspect is the dynamic nature of the processing unit, that is, its ability to change length and linguistic type during the translator's work.

Studies of translation as a process have not revealed the existence of a strictly defined sequence of steps performed in any act of translation. Observations using the TAP protocol (Think-Aloud-Protocol), in which subjects are asked to speak aloud to audio or video recording everything that comes to mind during the translation task, showed that the translation process is influenced by many factors determined by the translator's qualifications, the translation situation, the type of translation task, etc. Accordingly, it can be expected that the unit of translation or the unit of processing may also vary significantly during the execution of the translation. For example, some studies show that the unit of translation used by novice translators is usually a single word, and experienced translators tend to identify and translate units of meaning realized in phrases, predicative phrases or sentences [9].

Studies of the translation unit in the context of the translation process allow us to draw several important conclusions. Firstly, since the processing unit is dynamic in nature and largely depends on the translation task and situation, it is difficult to give this concept a general definition. Secondly, based on the result of the transfer act, it is impossible to determine which units were used in the translation process. Thirdly, the quality of the final text indicates whether the translator has chosen units of the size necessary to create a translated text of proper quality. If the size of the processing units is insufficient, the translator may not be able to choose the optimal expressions for the output text [11, p.358)]. As K. Malmkier notes [10, p.286] "... target texts in which the units [of analysis] are larger look more acceptable than those in which the units are smaller." Further, the researcher comes to the conclusion that the main unit of translation is predication.

The concept of a translation unit can also shed light on some problems in the field of machine translation. Machine translation (MT) is a method of processing text in English in order to obtain text in English using computational methods. Machine translation is an interdisciplinary field, and this task has been approached from various perspectives, including linguistics and statistics. Machine translation systems are usually divided into rule-based translation (RBMT) and statistical translation (SMT). In recent years, a new approach, neural machine translation (NMT), has been actively developed. SMT and NMT can be characterized as non-linguistic methods, in contrast to the RBMT linguistic approach [3].

Within the framework of the RBMT linguistic approach, the concept of a translation unit is used as an analytical concept. In the process of creating a translated text, the translation unit becomes the processing unit for the algorithm used by the system. Thus, within the framework of linguistic machine translation, translation units are both units of analysis and units of processing [7]. Historically, the types of translation units used in rule-based machine translation differ somewhat. The first generation systems worked with word-level translation units. These systems used direct translation strategies, that is, displaying the words of the source text directly into the words of the target language. The input text was transformed into the output text by sequentially replacing words with their equivalents in the target language in accordance with the basic rules of grammar that determine the order of words, the tense forms of verbs, the agreement of the subject and predicate, etc. The structural analysis of the input text was minimal [13], the analysis of context and meaning was completely absent. Direct translation systems were the implementation of bilingual dictionaries with certain rules of syntactic restructuring to account for structural differences between the source and target languages and did not distinguish between analysis (processing of the source text) and synthesis (creation of a translated text).

Direct translation systems could not provide high quality translation and were modified into more advanced systems based on transfer, a stage of interlanguage operations that consisted in building an intermediate syntactic representation adapted to the structure of the sentence in the target language. Unlike the direct translation strategy, the architecture of transfer-based CAT systems includes separate analysis and synthesis procedures serviced by separate algorithms.

The further development of such systems has led to the emergence of machine translation based on a deep linguistic analysis of the source text at all linguistic levels (morphological, syntactic, semantic, pragmatic) and equally multilevel generation of the target text. This principle has been embodied in machine translation systems based on interlingua, an abstract representation of the source text that does not depend on the grammar of a given pair of languages.

A common feature of indirect approaches is that the first step in the translation procedure is the analysis stage, which creates a formal, system-specific representation of the syntactic structure of the original expression, that is, the main unit of translation is usually a sentence as the main unit of syntax, which can be considered as the maximum area of grammatical analysis. On the other hand, a sentence-level system may produce an unsatisfactory translation if the input data is not recognized as a syntactically complete sentence of the source language. The failure may be due either to the fact that the original sentence contains a grammatical structure that is not described by the corresponding rule in the system, or to the fact that the input data is not a complete sentence. However, among rule-based systems, there are also those that process units at the predication level, if the system cannot analyze the input sentence completely, but is able to recognize its parts as independent syntactic units.

If in RBMT systems translation is carried out on the basis of information about the source and the translating languages and their interrelationships, then in statistical machine translation systems (SMT) translation is based on statistical (probabilistic) information about repetitive patterns in large bodies of parallel texts. The corpus of texts is used to train the system and is a source of data for calculating probability, on the basis of which translations are created. The translation is based on information about translation correspondences between sequences of words, N-grams, in a bilingual corpus. N-grams are statistical models that predict the next word after N-1 words based on the probability of their combination. The maximum value of N may vary in different systems [4].

It should be noted that for SMT systems, the N-gram cannot be considered a translation unit in the literal sense, since in translation theory the translation unit is primarily a linguistic concept, and SMT system methods work without using linguistic information about the source and translating languages, therefore they are classified as non-linguistic machine translation methods. However, although the N-gram cannot be considered a unit of analysis in the theoretical and translation sense, it can be considered as a unit of processing determined by an algorithm implemented in a specific SMT system [15, p. 254].

Probabilistic methods are also used in neural machine translation (NMT) systems based on neural network technology, in which computing systems are modeled in the image and likeness of biological neural networks. NMT models are trained not on N-grams, but on representations of a completed sentence in I and N. Even in this case, words remain important units in the source and translated texts, but "the connections between and target words, phrases and sentences of the source text and the translation text are established only implicitly, as mappings between their continuous representations" [6, p. 1701].

The fact that none of the methods alone can achieve a satisfactory level of accuracy has led to the emergence of hybrid machine translation systems based on the use of different approaches to machine translation within the same system, for example, a combination of RBMT and SMT. One method involves using RBMT to create a translation, and then fine-tuning the result using SMT. In another method, this process goes in the opposite direction: statistical translation is used to analyze the text, and translation according to the rules is used to correct the final translation. Thus, it can be noted that both neural and hybrid machine translation systems use probabilistic (non-linguistic) methods, which means that such a theoretical translation concept as a "translation unit" in neural and hybrid machine translation systems does not seem relevant.

Conclusions:

There are two different approaches to the translation-theoretic concept of a translation unit, which is due to the fundamental differences between text-oriented and process-oriented approaches to translation research. Within the framework of text-oriented research, the unit of translation can be interpreted as a unit of analysis, and within the framework of process-oriented research as a unit of processing, a cognitive unit on which the translator's attention is focused.

From the point of view of machine translation, the concept of a translation unit is of little use for statistical methods, and even more so for neural machine translation. However, both interpretations of the concept of translation unit are applicable to rule-based machine translation systems. In such systems, the concept of an analysis unit refers to the types of fragments of the source text that can be identified by the system, and the processing unit refers to how the translation algorithm affects the analyzed source text to create a translation text.

The quality of the translation directly depends on the size of the segment of the source text selected as the translation unit: the larger the segment, the higher the quality. As follows from the above, within the framework of machine translation, translation units tend to be enlarged, but do not go beyond a single sentence, or are not applicable within the limits of probabilistic non-linguistic methods. Accordingly, at least one type of translation problem – the resolution of intra–textual connections - does not find a solution at this stage of the development of machine translation.

References
1. Alekseeva, I.S. (2004). Ââåäåíèå â ïåðåâîäîâåäåíèå [Introduction to Translation Studies]. SPb: Akademia.
2. Barkhudarov, L.S. (1969). Óðîâíè ÿçûêîâîé èåðàðõèè è ïåðåâîä [Levels of language hierarchy and translation]. In Tetradi perevodchika [Translator’s notebook], 6, 4-12.
3. Butusova, A.S., & Bets, Y.V. (2021). Ìàøèííûé è àâòîìàòèçèðîâàííûé ïåðåâîä: ó÷åáíîå ïîñîáèå [Machine and automated translation: Manual]. Rostov-on-Don, Taganrog: SFU Publishers.
4. Gudkov, V.Y., Gudkova, E.F. (2011). N-ãðàììû â ëèíãâèñòèêå [N-gram in linguistics]. In Bulletin of Chelyabinsk State University, 24(239). Philology. Art History, 57, 69-71.
5. Hurtado Albir, A., Alves F. (2009) Translation as a cognitive activity. In Munday J. (ed), The Routledge Companion to Translation Studies (54-73). London: Routledge
6. Hutchins, W.J, & Somers, L.H (1992). An Introduction to Machine Translation. London: Academic Press.
7. Kalchbrenner, N., Blunsom Ph. (2013) Recurrent continuous translation models. In Proceedings of the ACL Conference on Empirical Methods in Natural Language Processing (EMNLP) (1700–1709). Association for Computational Linguistics.
8. Koller, W. (1992). Einführung in die Übersetzungswissenschaft. Heidelberg: Quelle& Meyer.
9. Lörscher, W. (1991). Translation Performance, Translation Process and Translation Strategies: A Psycholinguistic Investigation. Tübingen: Günter Narr.
10. Malmkjær, K. (1998). Unit of translation. In Baker M (ed), Routledge Encyclopedia of Translation Studies (286–287). London, New York: Routledge.
11. Sorvali, I. (2004). The problem of the unit of translation. In Kittel H., Frank A.P., Greiner N., Hermans T., Koller W., Lambert J., Paul F. (eds), Übersetzung – Translation – Traduction. An International Encyclopedia of Translation Studies, 1(354-362). Berlin, New York: Walter de Gruyter.
12. Shuttleworth, M., Cowie, M. (1997). Dictionary of Translation Studies. Manchester, UK, USA: St. Jerome Publishing.
13. Peng, L. (2013). A Survey of Machine Translation Methods. TELKOMNIKA Indonesian Journal of Electrical Engineering, 11(12), 7125-7130. doi:10.11591/telkomnika.v11i12.2780
14. Thunes, M. (2011). Complexity in Translation. An English-Norwegian Study of Two Text Types. PhD thesis. University of Bergen. Retrieved from https://bora.uib.no/bora-xmlui/handle/1956/5179
15. Thunes, M (2017). The concept of ‘translation unit’ revisited. Bergen Language and Linguistics Studies, 8(1), 241-259. doi:10.15845/bells.v8i1.133

Peer Review

Peer reviewers' evaluations remain confidential and are not disclosed to the public. Only external reviews, authorized for publication by the article's author(s), are made public. Typically, these final reviews are conducted after the manuscript's revision. Adhering to our double-blind review policy, the reviewer's identity is kept confidential.
The list of publisher reviewers can be found here.

The subject of the research of the reviewed article is aimed at deciphering the specifics of machine translation within the framework of continuity to the latter concept of "unit". The author not only theoretically clarifies the specified evaluation line, but also practically verifies this "link". The relevance of the issue is beyond doubt, the researcher himself gives a comment on this: "it is difficult to deny that machine translation occupies a growing place in translation activities. This technology has transformed global communication, facilitating networking, collaboration and mutual understanding around the world. Today, machine translation systems demonstrate a fairly high level of quality, especially in the framework of informative translation for widely spoken languages. At the same time, a number of problems remain unresolved. Machine translation still faces problems when dealing with context, intertextual relationships, idiomatic expressions, and domain terminology." I believe that the research methodology correlates with a number of recent scientific developments, verification does not cause doubts and complaints. The dialogical nature of the article is evident: "some foreign translation theorists believe that depending on whether the final text or the translation process is the object of research, the concept of a translation unit actually changes its content. Within the framework of end-text-oriented approaches, a translation unit can be understood as a unit of analysis, whereas in process-oriented research, it primarily means a unit of processing. Within the framework of text-oriented approaches, the main subjects of research are the characteristic features of translated texts, such as the relationship between the source and translated texts, as well as the comparison of different translations of the same originals, both into one and into several languages. What is common to such studies is that researchers consider the translation text in comparison with the original one, that is, objects that existed before the observation began." The effect of contact is also noticeable within the framework of [reader – author], which is important for a greater assessment of the material. The scientific novelty of the research lies in the systematic assessment of the "unit of translation", the comparison of this concept with the context, text, situation. I note that the main blocks of the work are harmoniously sustained – style – language – form of presentation – correlate with the scientific type itself: "studies of translation as a process have not revealed the existence of a strictly defined sequence of steps performed in any act of translation. Observations using the TAP protocol (Think-Aloud-Protocol), in which subjects are asked to speak aloud to audio or video recording everything that comes to mind during the translation task, showed that the translation process is influenced by many factors determined by the translator's qualifications, the translation situation, the type of translation task, etc. Accordingly, it can be expected that the unit of translation or the unit of processing may also vary significantly during the execution of the translation. For example, some studies show that the unit of translation used by novice translators is usually a single word, and experienced translators tend to identify and translate units of meaning implemented in phrases, predicative phrases or sentences," or "direct translation systems could not provide high quality translation and were modified into more advanced ones systems based on transfer – the stage of interlanguage operations, which consisted in building an intermediate syntactic representation adapted to the structure of the sentence in the target language. Unlike the direct translation strategy, the architecture of transfer-based CAT systems includes separate analysis and synthesis procedures serviced by separate algorithms," etc. I think that some fragments of the article could be simplified / concretized / made more accessible, this would allow the readership to expand and strengthen. I note that the reference block is unified, no editing is needed in this case: "probabilistic methods are also used in neural machine translation (NMT) systems based on neural network technology, in which computing systems are modeled in the image and likeness of biological neural networks. NMT models are trained not on N-grams, but on representations of a completed sentence in I and N. Even in this case, words remain important units in the source and translated texts, but "the connections between and target words, phrases and sentences of the source text and the translation text are established only implicitly, as mappings between their continuous representations." The structure of the text meets the standard of scientific research, the terms and concepts are unified, no serious discrepancies have been identified. The material, I note, is informative, strict, consistent and logical. The conclusions of the text do not contradict the main part; the author notes that "from the point of view of machine translation, the concept of a translation unit is of little use for statistical methods, and even more so for neural machine translation. However, both interpretations of the concept of translation unit are applicable to rule-based machine translation systems. In such systems, the concept of an analysis unit refers to the types of fragments of the source text that can be identified by the system, and the processing unit refers to how the translation algorithm affects the analyzed source text to create a translation text ...". The list of sources, in my opinion, can be adjusted – see the publication standard. The material has a practical orientation, it is advisable to use it when mastering courses on translation theory, pragmatics of translation activities. In general, the purpose of the work has been achieved, the tasks have been solved, the research topic has been disclosed. I recommend the peer-reviewed article "On the applicability of the concept of unit of translation to machine translation" for publication in the scientific journal "Litera" of the publishing house "Nota Bene".