Translate this page:
Please select your language to translate the article


You can just close the window to don't translate
Library
Your profile

Back to contents

Software systems and computational methods
Reference:

Batura T.V. Techniques of determining author’s text style and their software implementation

Abstract: the article presents a review of formal methods of text attribution. The problem of determining the authorship of texts is present in different field and is important for philologists, literary critics, historians, lawyers. In solving the problem of text attribution the main interest and the main complexity is in the analysis of syntactic, lexical/idiomatic and stylistic levels of text. In a sense, a narrower task is in the text sentiment-analysis (defining the tone of the text). Techniques for solving the task can be useful for identifying authorship of the text. Unfortunately, expert analysis of author’s style is complex and time consuming. It’s desirable to find new approaches, allowing at least partially automate experts’ work. Therefore the article pays special attention exactly to the formal methods of author’s identification and software implementation of such methods. Currently, algorithms of data compression, methods of mathematical statistics, probability theory, neural networks algorithms and cluster analysis algorithms are applied for text attribution. The article describes the most popular software systems for author’s style identification for Russian language. Author attempts to make a comparative analysis, identify features and drawbacks of the reviews approaches. Among the problems hindering researches in text attribution there are a problem of selecting linguostylistic parameters of the text and a problem of selecting sample texts. The author states that there is a need in further researches, aimed at finding new or improving existing methods of texts attribution, at finding new characteristics allowing to clearly separate author’s style, including cases of short texts and small number of sample texts.


Keywords:

text attribution, defining authorship, formal text parameters, author’s style, text classification, machine learning, statistical analysis, computer linguistics, identification of author’s style, analysis of textual information


This article can be downloaded freely in PDF format for reading. Download article


References
1. Romanov A.S. Metodika i programmnyy kompleks dlya identifikatsii avtora neizvestnogo teksta: Avtoref. dis. kand. tekh. nauk. Tomsk, 2010. 26 s.
2. Marusenko M.A. Atributsiya anonimnykh i psevdonimnykh literaturnykh proizvedeniy metodami teorii raspoznavaniya obrazov. L.: LGU, 1990. 164 s.
3. Rodionova E.S. Metody atributsii khudozhestvennykh tekstov // Strukturnaya i prikladnaya lingvistika: Mezhvuzovskiy sbornik. SPb.: SPbGU, 2008. Vyp. 7. S. 118–127.
4. Markov A.A. Ob odnom primenenii statisticheskogo metoda // Izvestiya Imperatorskoy Akademii nauk. Ser. 6. 1916. T. 10, ¹ 4. S. 239–242.
5. Fomenko V.P., Fomenko T.G. Avtorskiy invariant russkikh literaturnykh tekstov // Novaya khronologiya Gretsii: Antichnost' v Crednevekov'e. M.: MGU, 1995. 422 s.
6. Khmelev D.V. Klassifikatsiya i razmetka tekstov s ispol'zovaniem metodov szhatiya dannykh // Vse o szhatii dannykh, izobrazheniy i video. 2003. URL: http://compression.ru/download/articles/classif/intro.html (data obrashcheniya: 17.04.2014)
7. Khmelev D.V. Raspoznavanie avtora teksta s ispol'zovaniem tsepey A.A. Markova // Vestnik MGU. Ser. 9: Filologiya. 2000. ¹2. S. 115–126.
8. Kukushkina O.V., Polikarpov A.A, Khmelev D.V. Opredelenie avtorstva teksta s ispol'zovaniem bukvennoy i grammaticheskoy informatsii // Problemy peredachi informatsii. M.: Nauka, 2001. T. 37. ¹ 2. S. 96–108.
9. Shevelev O.G. Razrabotka i issledovanie algoritmov sravneniya stiley tekstovykh proizvedeniy: Avtoref. dis. kand. tekh. nauk. Tomsk, 2006. 18 s.
10. Timashev A.N. Atributor // Tekstologiya. ru. 1999–2007. URL: http://www.textology.ru/atr_resum.html (data obrashcheniya: 17.04.2014)
11. Informatsionnaya sistema «Statisticheskie metody analiza literaturnogo teksta». 2004. URL: http://smalt. karelia.ru (data obrashcheniya: 16.04.2014) .
12. Rogov A.A., Sidorov Yu.V., Korol' A.V. Avtomatizirovannaya sistema obrabotki i analiza literaturnykh tekstov SMALT // Trudy i materialy II-go Mezhdunarodnogo kongressa issledovateley russkogo yazyka «Russkiy yazyk: istoricheskie sud'by i sovremennost'». M: MGU, 2004. S. 485–486.
13. Antiplagiat. 2005–2014. URL: http://www.antiplagiat.ru (data obrashcheniya: 16.04.2014)
14. Shevelev O.G. Metody avtomaticheskoy klassifikatsii tekstov na estestvennom yazyke: Uchebnoe posobie. Tomsk: TML-Press, 2007. 144 s.
15. Romanov A.S., Meshcheryakov R.V. Identifikatsiya avtora teksta s pomoshch'yu apparata opornykh vektorov / A.S. Romanov, R.V. Meshcheryakov // Komp'yuternaya lingvistika i intellektual'nye tekhnologii: Po materialam ezhegodnoy Mezhdunarodnoy konferentsii «Dialog 2009». M.: RGGU, 2009. Vyp. 8, ¹15. S. 432–437.
16. Pang B., Lee L. Opinion mining and sentiment analysis // Foundations and Trends in Information Retrieval. Vol. 2, No 1-2. 2008. P. 1–135.
17. Pazel'skaya A.G., Solov'ev A.N. Metod opredeleniya emotsiy v tekstakh na russkom yazyke // Komp'yuternaya lingvistika i intellektual'nye tekhnologii: cb. nauchnykh statey. M.: Izd-vo RGGU, 2011. Vyp. 10, ¹17. S. 510–522.
18. Yi J., Nasukawa T., Bunescu R., Niblack W. Sentiment analyzer: extracting sentiments about a given topic using natural language processing techniques // Proc. of the Third IEEE International Conference on Data Mining (ICDM 2003), 2003. P. 427–434.
19. Ostin Dzh. Slovo kak deystvie // Novoe v zarubezhnoy lingvistike. M.: Progress, 1986. Vyp. 17. S. 22–130.
20. Onlayn entsiklopediya «Krugosvet». 1997–2014. URL: http://www.krugosvet.ru/enc/gumanitarnye_nauki/lingvistika/ RECHEVO_AKT.html (data obrashcheniya: 15.04.2014)
21. Serl' Dzh. Chto takoe rechevoy akt? // Novoe v zarubezhnoy lingvistike. M., 1986. Vyp. 17. S. 151–169