Library
|
Your profile |
Philology: scientific researches
Reference:
Zenkov A.V., Zenkov M.A., Zenkov N.A.
Pelevin vs Sorokin: an Attempt of Stylometric Comparison
// Philology: scientific researches.
2024. № 7.
P. 130-141.
DOI: 10.7256/2454-0749.2024.7.71169 EDN: OLBCPG URL: https://en.nbpublish.com/library_read_article.php?id=71169
Pelevin vs Sorokin: an Attempt of Stylometric Comparison
DOI: 10.7256/2454-0749.2024.7.71169EDN: OLBCPGReceived: 30-06-2024Published: 01-08-2024Abstract: Our study is related to quantitative linguistics and focuses on the application of a new method for analyzing the author's style in literary texts. The method uses computer analysis of numerical data found in texts, including both cardinal and ordinal numerals, expressed both in numbers and verbally. Author used the program which automatically removed phraseological units and fixed combinations accidentally containing numerals. Before analysis, the text must be manually cleaned of numbers that do not contribute to the author's artistic vision, such as page numbers or chapter numbers. The analysis revealed that the use of numerals by an author in his/her texts is unique and individual, forming a characteristic feature that distinguishes texts by different authors. For the first time, a formal quantitative stylometric analysis is performed of the literary works by Victor Pelevin and Vladimir Sorokin – authors whose literary styles share many similarities when viewed through the lens of a traditional descriptive philological approach. To validate this methodology, we have also included the texts of four "impostor" authors in our analysis. It has been found that Pelevin's and Sorokin's texts differ significantly in their use of numerals. The data on occurrences of numerals in the texts were subjected to hierarchical clustering, which accurately divided the texts into groups based on their authorship. Since the clusterization results can be influenced by the choice of both metrics and clustering method, we tried various reasonable combinations of them to ensure the reliability of our results. Each time, the dendrogram would change only slightly. Thus, the clustering outcomes were found to be reliable. The proposed new method of quantitative linguistics, which is based on the analysis of numerals in literary texts, has the potential to successfully solve the stylometric problems, particularly related to the attribution of texts. Keywords: stylometry, quantitative linguistics, text attribution, text authorship, numerals in texts, Victor Pelevin, Vladimir Sorokin, hierarchical cluster analysis, dendrogram, Manhattan metricsThis article is automatically translated. 1. Introduction The annual appearance of another novel by Viktor Pelevin, unchanged for many years (by now this is a Journey to Eleusis, 2023), stimulates the attention of the reading public and literary criticism [1-6] to this peculiar type of socio-metaphysical fantasy, in which the funny and parodic coexist with black humor and absurdist plot twists, and apt everyday observations – with elements of the occult and surrealism. Pelevin has been compared to such masters of the socio-metaphysical fantasy genre as Gogol, Kafka and Borges, and in recent decades many have appreciated him as a writer who captured the spirit of the times and possessed the gift of foresight. Interest in Pelevin's personality is fueled by the almost complete closeness of his private life, like the "great recluses" D. Salinger and T. Pynchon. This even gave rise to rumors that the writer does not exist at all, but a group of authors works under the Pelevin brand; on the other hand, Pelevin's hidden authorship is seen in the texts of other authors (see below). The listed artistic features are largely characteristic of the work of Vladimir Sorokin, who, along with Pelevin, is considered one of the two stars of Russian postmodern literature, who are in continuous unspoken confrontation [6-11]. Not only at the grassroots reader level, but also in literary criticism, the texts of these two authors are often considered together. Without claiming to be a literary and critical analysis of the works of Pelevin and Sorokin, in this work we will apply a formal quantitative approach to their texts, which, as far as we know, has not been done by anyone yet. Stylometry (and more broadly understood quantitative linguistics) – the quantitative study of the author's features of texts, including for their attribution – still does not have a completely satisfactory universal working method [12, 13]: the frequency of occurrence in texts of significant parts of speech and service words (prepositions, conjunctions), average word lengths and The most common words and even letter combinations are compared in a pair of analyzed texts (oddly enough, the latter approach often gives good results). Unfortunately, different methods often lead to contradictory conclusions, so it is more reliable to use several methods together. Promising results were obtained using neural networks, and soon, apparently, artificial intelligence will be able to successfully solve the problems of quantitative linguistics [14], but meaningful interpretation of the results with this approach is difficult, since the method itself is a "black box". We have developed an original stylometric method for analyzing author's texts based on taking into account the authors' use of numerals in their texts [15, 16]. Among the significant parts of speech, numerals are by their nature the most easily quantifiable. With regard to an artistic (not rigidly factual) text generated by free imagination, it is natural to assume that the use of numerals is associated with the psychological characteristics of the author, imperceptibly influencing the result of creativity for himself. Therefore, the manner of using numerals is an author's feature (fingerprint), which allows, under certain circumstances, to solve the problem of authorship of the text. Note also that, unlike all the methods listed above, it is the statistics of the use of numerals that are invariant with respect to the translation of a text into another language. This makes it possible, if the original text in a given language is unavailable, to use its available translation, as well as to quantitatively compare the texts of authors who worked in several languages (A. Strindberg, S. Beckett, V. V. Nabokov, ...). The analysis of the works of several dozen authors in Russian, Czech, and English revealed tangible authorial features of the use of numerals in texts, the influence of genre, style, and artistic direction on them [17-22]. Thus, the results of the analysis allow for a meaningful philological interpretation. In this paper, we will analyze from the point of view of the use of numerals the main literary works of V. O. Pelevin and V. G. Sorokin, as well as some other texts that will be brought into consideration for the sake of reliability of the results obtained. 2. Method and objects of research A computer program was used that searches for numerals in the Russian-language text, expressed both in numbers (numbers) and verbally in different word forms [22]. The search is based on comparing the words of the text with the dictionary base from M. Hagen's dictionary – A complete paradigm. Morphology. Frequency dictionary. Combined Dictionary (http://speakrus.ru/dict2/#morph-paradigm ). The program automatically removed phraseological units and stable combinations from the text, randomly (without the author's intention) containing numerals (like the back of your hand, behind seven locks, ...). Previously, page numbers, chapters, and enumerations were manually deleted from the text 1), 2), 3), ... etc . We have analyzed some of the most voluminous works of Pelevin and Sorokin, presented in Table. 1. The choice of author's texts for analysis was influenced by their availability for free download on the Internet, as well as their non-affiliation (at the time of preparation of this work) to the proscription lists.
3. Results The inverse density of numerals is calculated for each text as a result of dividing the volume of the text by the number of numerals found in it. The lower the inverse density, the more often numerals occur in the text. Already a comparison of the inverse densities of numerals reveals a significant difference between Pelevin's works (No. 1-15 in Table. 1) and Sorokin (No. 16-22): the average inverse densities differ by a third; in Sorokin's texts, numerals are used more often (more detail). At the same time, according to the magnitude of fluctuations in the inverse density in the analyzed texts (the ratio of maximum and minimum density: 1.6 and 2.2 times in the texts of Pelevin and Sorokin, respectively), the manner of using numerals is more uniform in Pelevin. An even more definite difference in the use of numerals by the two authors is seen when using hierarchical cluster analysis [23], combining objects (here: texts) into clusters according to the principle of similarity – in our case, the similarity of the absolute frequencies of the numerals 1, 2, 3, ... , 10 in the texts (these numerals are present without exception in all analyzed texts). Since the texts vary significantly in volume (see Table. 1), for frequency comparability, we introduced correction coefficients, choosing S.N.U.F.F. Pelevin as the reference text for comparison. Therefore, for example, the frequencies for Generation N had to be multiplied by 1 285 434/ 832 755 = 1.54, and for the Day of the Oprichnik – on 1 285 434/ 414 628 = 3.10. As you know, the measure of similarity in cluster analysis is the metric p ("distance"): the smaller the "distance" between objects, the greater the similarity between them. We applied the Manhattan metric where x and y are n–dimensional vectors, the components of which are the corrected absolute frequencies of the first n natural numbers in the two analyzed texts (here n = 10). In the clustering process, the far neighbor method (Complete linkage method) was used [24], which leads to the formation of compact, well-defined clusters. The studied texts were ideally distributed in clusters according to authorship (Fig. 1). The superclusters of Pelevin and Sorokin's texts merge at high altitude, which again confirms the great differences between the texts of the two authors. Note that this makes the marginal point of view about the group of authors writing under the brand "Pelevin" questionable. In modern stylometry, the point of view is accepted that when comparing the texts of two specific authors, only an analysis in which the studied texts are "diluted" with the texts of fake authors (the so–called impostor s - "impostors") will have evidentiary force about their similarity/difference [25]. Following these ideas, we have introduced additional literary texts (see Table. 2) and re-clustered (Fig. 2). A few conclusions following from the table. 2 and fig. 2: · Additional texts were also clustered according to authorship; · Writing a work jointly by two authors (Fr. Robski, K. Sobchak – No. 4 in the table. 2) makes it unlike the texts of only one of the authors (O. Robski – No. 2, 3 in Table. 2) and forces clustering separately – an additional argument in favor of the assumption of numerals as the author's invariant; · The texts of Pelevin and Sorokin still never fall into the same low-level cluster, which supports the conclusion made above about the significant differences between the texts of the two authors.
The work "Okolonol", a literary hoax published in 2009 under the pseudonym "Nathan Dubovitsky", requires separate consideration. In the disputes about authorship, V. Sorokin and V. Pelevin were mentioned as possible authors, in particular. It has been suggested in Russian and foreign media that the novel was written by Russian politician Vladislav Surkov. He himself has been controversial about this. By now, his authorship is considered recognized [26]. What does our analysis show in terms of numerals statistics? The inverse density of numerals for this text is in the middle between the average values for the texts of Pelevin and Sorokin (Table. 2); on the dendrograms (Fig. 2, 3), "Okolonola" is not included in a low-level cluster with any work by these authors. Hypotheses about Pelevin or Sorokin as the alleged authors are not accepted. Of course, this does not prove Surkov's authorship, but we do not have any proof. his additional literary text for the study of this issue. As you know, the choice of metrics and clustering method cannot be strictly justified; meanwhile, they can significantly affect the results of clustering. We conducted clustering of texts by the same authors as in Fig. 2, but using not the method of the far neighbor, as in the previous attempt, but the method of intergroup connections (Group average method, Between-groups linkage) [24]; still with the Manhattan metric (Fig. 3). In our case the results turned out to be quite stable; all conclusions remain valid. Other reasonable combinations of the metric and clustering method also only slightly change the dendrogram. Table 1 The occurrence of numerals in the studied works
Table 2 The occurrence of numerals in the texts of fictitious authors
4. Conclusions The new approach we are developing to the problems of stylometry, based on the analysis of statistics of numerals in texts, for all its simplicity, demonstrates high efficiency and sensitivity. The texts of V. O. Pelevin and V. G. Sorokin, the comparative analysis of which has been carried out so far only within the framework of the traditional descriptive philological approach, were for the first time subjected to formal quantitative analysis, which correctly distributed the texts according to authorship. Significant authorial differences in the manner of using numerals were found. The involvement of third-party authors (impostors) for the analysis of texts enhances the significance of the result obtained and confirms its non-random nature. The method is suitable for attribution of texts.
Figure 1 is the result of applying hierarchical cluster analysis to the texts of V. O. Pelevin and V. G. Sorokin (the far neighbor method and the Manhattan metric were used for clustering). The horizontal axis indicates the "distance" in arbitrary units Figure 2 is the result of applying hierarchical cluster analysis to the texts of V. O. Pelevin and V. G. Sorokin with the addition of texts by outside authors (the clustering used the method of the far neighbor, the Manhattan metric). The horizontal axis indicates the "distance" in arbitrary units
Figure 3 is the result of applying hierarchical cluster analysis to the texts of V. O. Pelevin and V. G. Sorokin with the addition of texts by outside authors (the method of intergroup relations and the Manhattan metric were used for clustering). The horizontal axis indicates the "distance" in arbitrary units References
1. Bogdanova, O.V., Kibalnik, S.A., & Safronova, L.V. (2008). Литературные стратегии Виктора Пелевина [Literary strategies of Victor Pelevin]. Saint Petersburg: Petropolis.
2. Polotovski, S.A. & Kozak, R.V. (2012). Пелевин и поколение пустоты [Pelevin and the generation of emptiness]. Moscow: Mann, Ivanov and Ferber. 3. Shilova, N.L. (2011). Визионерские мотивы в постмодернистской прозе 1960–1990-х годов (Вен. Ерофеев, А. Битов, Т. Толстая, В. Пелевин) [Visionary motives in postmodern prose of the 1960–1990s (Ven. Erofeev, A. Bitov, T. Tolstaya, V. Pelevin)]. Petrozavodsk: The Publ. House of the Karelian State Pedagogical Academy. 4. Khagi, S. (2018). Alternative Historical Imagination in Viktor Pelevin. Slavic and Eastern European Journal, 62(3), 483–502. 5. Khagi, S. (2023). Пелевин и несвобода: Поэтика, политика, метафизика [Pelevin and Unfreedom: Poetics, Politics, Metaphysics]. Moscow: Novoe literaturnoie obozrenie. 6. Lanin, B.A. (2015). Новая старая литературократия: Сорокин и Пелевин в борьбе с традициями [The new old literaturocracy: Sorokin and Pelevin's fight against tradition]. Cennosti i smysly, 40(6), 110–123. 7. Bogdanova, O.V. (2005). Концептуалист, писатель и художник Владимир Сорокин [Conceptualist, writer and artist Vladimir Sorokin]. Saint Petersburg: Saint Petersburg State University. 8. Andreeva, N.N., & Bibergan, E.S. (2012). Игры и тексты Владимира Сорокина [Games and texts of Vladimir Sorokin]. Saint Petersburg: Petropolis. 9. Marusenkov, M.P. (2012). Абсурдопедия русской жизни Владимира Сорокина: Заумь, гротеск и абсурд [The Absurdopedia: Vladimir Sorokin's Russian Life in Abstraction, Grotesque, and Absurdity]. Saint Petersburg: Aleteia. 10. Bibergan, E.S. (2014). Рыцарь без страха и упрёка: Художественное своеобразие прозы Владимира Сорокина [A Knight without Fear and Reproach: The Artistic Originality of Vladimir Sorokin's Prose]. Saint Petersburg: Petropolis. 11. Kalinin, I.A., Lipovetski, M.N., Dobrenko, E.A. et al. (2018). «Это просто буквы на бумаге…». Владимир Сорокин: после литературы [“These are just letters on paper... ". Vladimir Sorokin: After Literature ]. Moscow: Novoe literaturnoie obozrenie. 12. Stamatatos, E. (2009). A survey of modern authorship attribution methods. J. Amer. Soc. for Information Science and Technology, 60(3), 538–556. 13. Tempestt, N., Kalaivani, S., Aneez, F., Yiming, Y., Yingfei, X., & Damon, W. (2017). Surveying Stylometry Techniques and Applications. ACM Comput. Surv., 50(6), Article 86. 14. La Inteligencia Artificial ayuda a descubrir una obra desconocida de Lope de Vega en los fondos de la BNE, Biblioteca Nacional de España [Artificial Intelligence helps to discover an unknown work by Lope de Vega in the collections of the BNE, National Library of Spain], https://www.bne.es/es/noticias/inteligencia-artificial-ayuda-descubrir-obra-desconocida-lope-vega-fondos-bne 15. Zenkov, A.V. (2017). Новый метод стилеметрии на основе статистики числительных [A new method of stylometry based on numerals statistics]. Kompiuternye issledovaniia i modelirovanie, 9(5), 837–850. 16. Zenkov, A.V. (2018). A Method of Text Attribution Based on the Statistics of Numerals. J. of Quantitative Linguistics, 25(3), 256–270. 17. Zenkov, A.V., & Místecký, M. (2019). The Romantic Clash: Influence of Karel Sabina over Macha’s Cikani from the Perspective of the Numerals Usage Statistics. Glottometrics, 46, 12–28. 18. Zenkov, A.V. (2021). Stylometry and Numerals Usage: Benford’s Law and Beyond. Stats, 4, 1051–1068. 19. Zenkov, A., & Místecký, M. (2022). Young Vladimír Vašek? – A Numerals Analysis Contribution to the Bezruč−Hrzánský Identity Issue. Naše řeč, 105(3), 151–161. 20. Zenkov, A.V. (2023). Литературные мистификации и авторское использование числительных [Literary hoaxes and the use of numerals by authors]. Filologicheskie nauki. Voprosy teorii i praktiki, 16(11), 3696–3709. https://doi.org/10.30853/phil20230568 21. Zenkov, A.V. (2023). Under a False Flag: Literary Hoaxes and the Use of Numerals. Litera, 10, 86–109. Retrieved from https://doi.org/10.25136/2409-8698.2023.10.68743 22. Zenkov, A.V., & Ermakov, N.E. (2023). Числительные в текстах как характерная особенность авторского стиля [The use of numerals in texts is a distinctive feature of the author's writing style]. Russian Linguistic Bulletin, 45(9). Retrieved from https://doi.org/10.18454/RULB.2023.45.28 23. Moisl, H. (2015). Cluster Analysis for Corpus Linguistics. De Gruyter Mouton. 24. Gan, G., Ma, C., & Wu, J. (2007). Data Clustering: Theory, Algorithms, and Applications. Society for Industrial and Applied Mathematics. 25. Koppel, M., & Winter, Y. (2014). Determining if two documents are written by the same author. J. of the Association for Information Science and Technology, 65(1), 178–187. 26. Plekhanova, I.I. (2013). Внутрилитературная полемика начала XXI века: мотивы и содержание («Околоноля» Н. Дубовицкого и «S.N.U.F.F.» В. Пелевина) [The intra-literary debate of the early 21st century: themes and content (N. Dubovitsky's "Okolonolia" and V. Pelevin's "S.N.U.F.F.")]. Filologicheski klass, 33(3), 26–32.
Peer Review
Peer reviewers' evaluations remain confidential and are not disclosed to the public. Only external reviews, authorized for publication by the article's author(s), are made public. Typically, these final reviews are conducted after the manuscript's revision. Adhering to our double-blind review policy, the reviewer's identity is kept confidential.
|