Rinchinov O.S. —
The Diachronic Corpus of the Buryat Language as a Digital Tool for Historical Research: Approaches, Solutions and Experiments
// Historical informatics. – 2020. – ¹ 2.
– P. 26 - 34.
DOI: 10.7256/2585-7797.2020.2.33446
URL: https://en.e-notabene.ru/istinf/article_33446.html
Read the article
Abstract: The article studies the diachronic corpus of the Buryat language compiled on the basis of annals written in old Mongolian used to reconstruct the history and historical geography of the Buryat people. In this regard, the article discusses the main problems of semantic markup of corpus data. The size of the corpus currently exceeds 82,000 words. The research novelty is that classical Mongolian texts presented in Latin transliteration are addressed by computer linguistics methods for the first time. The author describes approaches to develop the ontological outline of the historical and cultural subject area as well identifies the kinship and geographical context elements. The MS Access and SQL simulation experiment demonstrates the advantages of the authority control methodology, in particular the “family” and “place” categories, for the initial analysis of corpus data and the formation of semantic clusters. The use of authoritative records has significantly accelerated the accumulation of empirical data for automation of the substantive analysis of texts in the corpus. These experiments allowed the author to see further steps to create and improve the Buryat language diachronic corpus semantic markup tools and transform this language into a convenient tool for historical research.