Translate this page:
Please select your language to translate the article


You can just close the window to don't translate
Library
Your profile

Back to contents

History magazine - researches
Reference:

Database in the Study of the History of Estates in the Moscow Region: Filling, Enrichment and Analytics

Trishin Ivan

Master of History, Historical Information Science Department, Moscow State University 

301570, Russia, Tul'skaya oblast', pos. Volovo, ul. Aleksandrova, 4A, kv. 2

trishin_ivan@rambler.ru
Other publications by this author
 

 

DOI:

10.7256/2454-0609.2023.3.39859

EDN:

YZFHCY

Received:

27-02-2023


Published:

15-05-2023


Abstract: This article deals with the problem of accumulation and systematization of information in studies devoted to 3D reconstructions of objects of historical and cultural heritage. In the presence of a large number of heterogeneous sources, the researcher may get confused in the accumulated data, which significantly complicates his work. Since in research, the results of which are virtual reconstructions, text sources are used together with graphic, scientific and technical documentation, as well as physical objects undergoing preliminary digitization, the accounting and cataloging of sources cannot be neglected. The solution to this problem allows the most complete and efficient use of all the materials available in the study, which ultimately affects the quality of the reconstruction result obtained. In this article, using the example of a project for the reconstruction of estate complexes in the Moscow region, the author demonstrates the developed system of accumulation and enrichment of source information using modern database management systems and software tools for working with data. The information aggregated and described in the MySQL database is supplemented with information from open sources using web scraping technology in the Python programming language, acquires end-to-end identification and acquires usability in various studies. Standardized information allows you to quickly find the right source from the very top level, and its enrichment provides additional opportunities for analysis and synthesis of all aggregated material.


Keywords:

Historical Information science, Source studies, 3D-reconstructions, Databases, Estates of Moscow region, Python, MySQL, Web-scraping, Data enrichment, Cultural heritage

This article is automatically translated.

Modern scientific research is increasingly carried out at the intersection of disciplines, somewhat blurring the boundaries of the tools of scientific activity. Methods and technologies developed in natural science areas can be used in humanitarian research, as well as vice versa. The accumulation of knowledge and the formation of methods in different fields of research can help a completely different field of knowledge, and therefore the researcher's toolkit is expanding today due to the comprehensive development of scientific methodology and its particular examples.

The development of high technologies has allowed historical science to use large amounts of information in obtaining new knowledge. The concept of a mass historical source introduced into scientific circulation by I.D. Kovalchenko makes it possible to cover significant volumes of historical documents that were not fully studied by researchers in previous periods. The computing power of computer technology already in the 60-70s of the last century made it possible to build large information systems, working with which not only reduced the time spent on preparing for research, but also created the possibility of conducting such research, since previously manual processing of tens of thousands of forms and cards did not bring practical benefits.

Statistical studies in historical science were followed by works based on databases. The popularity of such databases in the eighties and nineties of the last century gave a new impetus to the study of mass sources, information from which was now completely entered into tabular databases and could be repeatedly used in further scientific research. A number of research works are still using databases to obtain new knowledge, the historiography of this area has many publications covering various variants of research interest[1][2].

* * *

In studies devoted to 3D reconstructions of historical and cultural heritage monuments, a special role is played by the process of collecting and structuring data necessary for carrying out reconstructions directly. Nevertheless, the continuation of the development of the chosen direction forces the author of the work to plunge into large-volume materials again, which, as a rule, remains a "dead weight" in files and folders. Nevertheless, the availability of an information system describing the researcher's work in the historical field of activity can greatly facilitate work on the chosen topic, as well as provide additional results that can complement the main conclusions of the study.

Attempts to restore manor complexes were repeatedly made during the Soviet period of their history, most often the work was limited to fixing the current state of manor complexes, less often some restoration work was carried out. Attempts to restore such complexes most often remained in the form of plans and architectural projects, the transition to the practical restoration of monuments was not completed. In the post-Soviet period, attempts were also made to restore manor complexes with the active assistance of patrons, but most of these attempts were unsuccessful. However, interest in manor history grew, a new impetus in the development of the study of manor history gave the opportunity to build virtual reconstructions and visualization of such reconstructions in a software environment. Since 2019, the Moscow State University Faculty of History has been working on a joint project with the Central State Archive of the Moscow Region[3] dedicated to virtual reconstructions of lost estates in the Moscow region (in which the author of this article is directly involved). Thanks to the efforts of historians and archivists, today there are virtual reconstructions of such suburban estates as Pushchino-on-Nare[4], Petrovskoye-Alabino[5], Olgovo[6], Molodi[7] and Nikolskoye-Uryupino[8][9], as well as unpublished reconstruction of the Mikhailovskoye estate and a number of other works that are defended at the Department of Historical Informatics of Lomonosov Moscow State University.

Russian Russian Homestead Research Society (OIRU), a leading organization in the study of Russian manor culture and history, has studied the history of manor complexes in the Moscow region. The most active in the study was the reference book "Moscow estates"[10], created by the non-profit partnership "Russian Estate" with the participation of members of the Society. This reference book made it possible to systematize knowledge about the lost complexes and identify the most promising objects for reconstruction. In the process of studying the reference book, the author decided to form a structured information system that would allow identifying the most interesting objects for reconstruction, as well as distributing preserved and lost estates in geographical and chronological terms. Information about the owners of these estates was also entered into the database, where the tables were linked by the identifiers of the estate complex. Also, the studied estates were linked to electronic copies of sources, on the basis of which an electronic reference book was formed, which was discussed in the article "The use of data management technologies in the creation of historical Internet resources" by the author of this study[11]. Previously, the mechanisms for manually loading data into the database through the interface were described, the second stage of work on the information system involved filling in tables with information contained in the published reference book, as well as enriching this information with data from open sources. Some results of this work are summarized below, including statistical generalizations on the estate complexes of the Moscow region, and the process of enriching the available data is described[12].

* * *

The data storage and processing system was built as follows: the visual interface of the directory was developed on the Wordpress website builder, the database itself was built in the MySQL database management system, and automatic data transfer was carried out using the Apache NiFi tool (all of these tools are free and freely distributed on the Internet). In addition to these tools, SQLite DBMS (a database file system that allows storing all tables and relationships in one physical movable document), as well as the Jupyter Notebook development environment, which allows data transformations using the Python programming language, were also used. Initially, it was planned to divide the database into four main tables: estates, personalities, sources and objects (models). This concept was slightly changed in the course of work: a table of reconstructions was added to the table of objects. The rest of the structure remained unchanged. The table of manor complexes included basic information on estates: identifier (number of the estate in the directory), name, district or city district of the Moscow region, the century of foundation, quarter of the century in which the estate was founded, the year of foundation, the identifier of the degree of preservation of the estate (author's expert assessment on a scale from 0 to 5), flag preservation of the main house of the estate, as well as comments. These signs are found in almost all directory entries. In total, the table received 629 entries according to the number of homesteads recorded in the directory. It is worth dwelling a little more on the classification of the preservation of estates, which was given among the signs.

Almost every entry in the directory contains notes about which objects have been preserved and have survived to the present day, as well as their condition. This information allowed us to identify six main levels of classification of the preservation of manor complexes (Table 1).

Table 1. The degree of preservation of manor complexes (according to the classification of the author of the work)Degree of preservation

Comment

5

The estate has been completely preserved or has been successfully restored, is in active operation

4

The estate has been preserved with minor losses, is in an abandoned state, but has not been destroyed or lost

3

The estate is in a ruined state, the location of buildings and structures has been preserved, the ruins can determine the plan of the estate, as well as partially use them for virtual reconstruction

2

Only some objects (outbuildings or outbuildings) that are in a ruined state or seriously altered have been preserved

1

There is one object that indicates the presence of a manor (most often a parish church built by the founder of the manor or his descendants in the village that belonged to them), ruined or functioning

0

The manor has not been preserved. There may be remnants of garden and park decoration.

 

 

 

 

This classification was supplemented by the flag of the preservation of the main house of the estate, as a system-forming object. The analysis of the safety of 629 complexes led to the following results:

  • ·        49 estate complexes (7.8%) are in excellent condition. Among them are such estates as Arkhangelsk, Abramtsevo, Talitsy, etc.
  • · 11% (70) of estate complexes have the fourth level of preservation (Pehra-Yakovlevskoye, Ogarkovo, Ashitkovo, etc.)
  • · 400 estates (63.5%) have a degree of preservation of 0 and 1. Most often there are estates in which only the temple has been preserved (see Table 1).
  • · In 26% of cases (163 estates), the main house has been preserved in one form or another.

Next , we will give some statistics on the periodization of the foundation of estates in the Moscow region:

  • ·        The largest number of estates (165) was founded in the second half of the XVIII century.
  • ·        For 35 complexes, the time of foundation is not specified.
  • · The earliest examples date back to the beginning of the XVI century (boyar patrimony), but only 3 estates were found until the last quarter of this century.
  • · 16 estates were founded at the beginning of the XX century . Most often, these objects were owned by industrialists and foreign craftsmen invited to work in Moscow or the Moscow region. No large complexes were found among them.
  • · 92 records contain the exact (or approximate) year of foundation of the estate.

Also, in some cases, the reasons for the loss of the manor complex are indicated, as well as approximate dates of destruction. Among the reasons most often there are fires and demolition. The following date ranges are most common in this regard: 1918-30. (the period of post-revolutionary destruction of estates), 1941 (many estates suffered from German bombing), as well as 1990-2000. Nevertheless, restoration activities were carried out on a number of complexes (mainly in the 1970s and 1980s, in most cases we are talking about groups of restorers, local historians, specialists in architectural restoration workshops), the results of which were architectural measurements and restoration projects containing drawings of the current state of the estate at the time of the project.

From the point of view of geographical location, the largest number of estates is located on the territory of Odintsovo city district (42), followed by Solnechnogorsk and Dmitrov districts (38 and 34, respectively). The smallest number of complexes (2) is located in the territories of the Pavlovo-Posadsky district, as well as the urban districts of Orekhovo-Zuyevo and Yegoryevsk.

Other classifications of manor complexes require the involvement of additional materials and are most often hindered by the historical context of the development of these complexes ("noble" and "merchant" estates intersect at the moment of the sale of the estate by the noble family into the hands of successful entrepreneurs and merchants, and architectural classification is impossible due to the loss of more complexes as a whole).

An important aspect when considering the issue of a possible reliable reconstruction of the estate is the security of the object with a source base. For the scientific reconstruction of the complex, only those backed up by a wide range of information collected in archives, restoration workshops and private collections are selected. The available information on such estates as Nikolskoye-Uryupino and Yaropoletskaya Chernyshev estate (objects of research by the author of this article working on virtual reconstructions of these estates) has been added to the main table of manor complexes and supplemented with data obtained from archives or open sources. Classification by the identifier of the estate and archival business (or if it is not present – by other signs of a particular publication) allows you to quickly find a place to store digital copies of sources on a local computer, as well as find the right sheet of archival business by keywords. At the moment, more than 500 documents are described in the source table. Similarly, tables of virtual reconstructions of estates were made: data on their storage location (both individual models and projects executed in the Unreal Engine virtual environment) were entered into the fields, as well as descriptions were given to simplify keyword search. A different situation is observed with the owners of manor complexes. Identifying information about them required enriching the data with information from open sources.

***

The published directory contains an alphabetical index of the owners of estates, which indicates the number of the estate, its name and surname and initials (and sometimes only the surname) of the owner. Processing this list manually (more than three thousand entries) would take a significant amount of time, so it was decided to use the TurboScan mobile application, which allows you to digitize images and recognize text. Thus, the list of owners was transferred to CSV format (spreadsheet), but before adding such a table to the database, there was a need for additional processing.

The main problem was the records of the dynasties of the owners, for example: "Yankov A.H., N.A., H.N.". In this case, three people are listed in one record at once. In order to get a list that is correct for work, it became necessary to split such records into several new lines. In this case, the Jupyter Notebook development environment (an interface for working with Python), as well as a number of open Python libraries, helped the work. With the help of the regular expression library[13],[14], the entire list of surnames was processed (Fig. 1) and a new table of estate owners was formed, in which the list grew from 2492 lines to 3140 (648 names were added). In the new temporary table of owners, the names of the owners were taken out separately for further data enrichment (Fig. 2).

Fig. 1. Code snippet for processing records with multiple names (screenshot of the author of the work)Fig. 2. Fragment of the updated owners table.

From number 10, the result of regular expressions is visible (screenshot of the author of the work)A separate list of surnames is important when enriching information with open sources.

For the current work, the Internet resource "Rodovod" was selected (Rodovod: Multilingual family tree [website] URL: https://ru.rodovid.org/wk/Multilingual genealogical tree), which contains the most convenient genealogical resource for processing. To manually obtain information about 3140 personalities, a huge amount of time and effort is needed, and therefore it was decided to automate this process using web scraping technology (15 top web scraping solutions of 2021 / Habr [website] URL: https://habr.com/ru/post/543760 ), which involves automatic crawling of web pages in order to obtain information.

Information about each genus on the specified information resource is available at the link "https://ru.rodovid.org/wkÐîä:***", where instead of asterisks, the plural surname of interest to the researcher is indicated (for example, Orlov – Orlov, Adlerberg – Adlerbergs, etc.). Further on the page there is a section of representatives of this surname, where basic information about personalities is available (full name, gender, year of birth, year of death, link to an article about the personality). Such a list is useful in current work when comparing surnames and initials with the years of birth and death (in reference books they are sometimes indicated in information about manor complexes). In order to get a unique list of surnames, all the entries received in the "surname" column were translated into the plural using the same regular expressions. To do this, a dictionary of the most frequent singular and plural endings (ov – ovs, ev – evs, berg – bergs, hiv – vichi, etc.) was compiled, after which all entries were checked using this dictionary, a list of the families of the owners of estates (about 1600 unique entries) was obtained. This list was automatically checked in the Rodovod electronic directory (in compliance with the scraping conditions specified in the site file "robots.txt "), and the data obtained were collected in a common table. Thus, 105,000 records of people belonging to these genera were obtained (Fig. 3). Using regular expressions, the full names of people were converted into surnames with initials, which can then be compared with the original published reference book.

Fig. 3. The table of the received data from the resource "Rodovod" (screenshot of the author of the work)For the test combination of information, the first 100 estate complexes were checked according to the list.

According to them, 553 names were mentioned, among which 203 contained at least one date (birth or death). According to these records, 77 people with matching dates and initials were compared, which is approximately 38% of the total number of records containing dates. Manual analysis allowed us to find 8 more people who supplemented the result to 41% of the owners found. While maintaining the ratio of the volumes of information available in the directory, there should be about 500 people on the full list, which in conditions of a limited amount of information should be considered a good working result. At the moment, the work on finding the owners of estates in the open database continues.

After the information about the owners was added to the general list, it was possible to form links between the tables and the format for uploading data to the electronic environment according to the algorithm described earlier in the article "Using data management technologies when creating historical Internet resources"[11]. Thus, it was possible to automatically fill in the electronic directory without significant manual work.

***

Summing up, I would like to once again emphasize the complexity of the task of accumulating and systematizing data in the study of the history of manor complexes. The problem of fragmentation and haphazard storage of information forces the researcher to look for solutions in modern ways of structuring and generalizing data. To solve this problem, the author of the above study created a software system based on the published reference book, in which the researcher can search for the necessary information about estates, automate the loading and enrichment of data, thus performing a chain of actions from the primary search for information to working with the source. Within the framework of the proposed system, work with the information complex from the side of software solutions is demonstrated, the process of its enrichment by external sources is shown. Thus, the possibility of analytical work outside of the designated directory is expanded not only due to additional information, but also due to the software capabilities of the systems described above.

References
1. Garskova, I. M. Historical information science. Evolution of the interdisciplinary direction / I. M. Garskova. - St. Petersburg: Aleteyya Publishing House, 2018. Pp. 210-214.
2. Yumasheva, Y. Y. Historical and biographical research: methods and databases // Ural Historical Bulletin. - 2015. - No. 4 (49). - S. 146-152. – EDN UYJSSP.
3. Borodkin L.I., Gerasimova Y.N. — Virtual reconstruction of historical estate complexes: collaboration between historians and archivists, project activities of students // Historical informatics. - 2020. - No. 3. - P. 103-111. DOI: 10.7256/2585-7797.2020.3.34273 URL: https://nbpublish.com/library_read_article.php?id=34273
4. Mamonova S.A. — Virtual reconstruction of the Pushchino-on-Nara estate near Moscow: sources, methods and technologies of research // Historical informatics. - 2020. - No. 3. - P. 136-165. DOI: 10.7256/2585-7797.2020.3.34245 URL: https://nbpublish.com/library_read_article.php?id=34245 (accessed 02/19/2023).
5. Poshevelya S.A. — Virtual reconstruction of the estate near Moscow Petrovskoe-Alabino: sources, methods and technologies of research // Historical informatics. - 2020. - No. 3. - P. 166-184. DOI: 10.7256/2585-7797.2020.3.33979 URL: https://nbpublish.com/library_read_article.php?id=33979 (accessed 02/19/2023).
6. Sorokina K.E. — Virtual reconstruction of the estate near Moscow Olgovo: sources, methods and technologies of research // Historical informatics. - 2020. - No. 3. - P. 112-135. DOI: 10.7256/2585-7797.2020.3.34229 URL: https://nbpublish.com/library_read_article.php?id=34229 (accessed 02/19/2023).
7. Kondrasheva D.I. — Virtual reconstruction of the Molodi estate near Moscow: sources, methods and technologies of research // Historical informatics. - 2020. - No. 3. - P. 185-210. DOI: 10.7256/2585-7797.2020.3.33989 URL: https://nbpublish.com/library_read_article.php?id=33989 (accessed 02/19/2023).
8. Trishin I.G. — Three-dimensional reconstruction of the estate complex Nikolskoe-Uryupino (Krasnogorsk city district, Moscow region): research methods and technologies // Historical informatics. - 2020. - No. 3. - P. 211-234. DOI: 10.7256/2585-7797.2020.3.33955 URL: https://nbpublish.com/library_read_article.php?id=33955 (accessed 02/19/2023).
9. Malandina T.V. — Virtual 3D reconstruction of the interiors of estates near Moscow in the 18th – early 20th centuries: ceremonial interiors of the Nikolskoye-Uryupino estate complex // Historical informatics. - 2021. - No. 2. - P. 134-170. DOI: 10.7256/2585-7797.2021.2.36029 URL: https://nbpublish.com/library_read_article.php?id=36029 (Accessed 2/19/2023).
10. Estates near Moscow. Catalog with a map of the location of estates - M .: NP "Russian Estate", 2018. - 408 p.
11. Trishin I.G. — The use of data management technologies in the creation of historical Internet resources // Historical informatics. - 2022. - No. 2. - S. 18-27. DOI: 10.7256/2585-7797.2022.2.38334 EDN: JTFMFO URL: https://nbpublish.com/library_read_article.php?id=38334 (accessed 2/19/2023).
12. Kovalenko M.V. — Analysis of data enrichment methods // Science without borders. 2021. No. 5 (57). URL: https://cyberleninka.ru/article/n/analiz-metodov-obogascheniya-dannyh (date of access: 02/19/2023).
13. Kolmogortsev S.V., Saraev P.V. — Extracting bibliography from texts by regular expressions // New information technologies in automated systems. 2017. No. 20. URL: https://cyberleninka.ru/article/n/izvlechenie-bibliografii-iz-tekstov-regulyarnymi-vyrazheniyami (Date of access: 02/19/2023)

Peer Review

Peer reviewers' evaluations remain confidential and are not disclosed to the public. Only external reviews, authorized for publication by the article's author(s), are made public. Typically, these final reviews are conducted after the manuscript's revision. Adhering to our double-blind review policy, the reviewer's identity is kept confidential.
The list of publisher reviewers can be found here.

Review of the article "Database in the study of the history of estates of the Moscow region: filling, enrichment and analytics" The reviewed article addresses the urgent problem of creating databases focused on systematization and further study of information about noble estates of the Moscow region, the number of which is estimated in the hundreds. The degree of preservation of these cultural heritage sites varies very significantly: from those that have been completely or partially lost to those that have been preserved in an almost authentic form. The formation of such databases makes it possible to reasonably select estates that are of interest for the development of their 3d models in order to create reliable virtual reconstructions based on a representative set of sources. The article provides a brief description of the experience of joint projects of the Central State Archive of the Moscow Region and the Faculty of History of Moscow State University in this field of research. The author of the article takes the next step in the development of such projects. The database he proposed is based on information from the directory "Moscow Estates", published recently with the participation of the Society for the Study of Russian Estates (OIRU). The author offers a visual interface for working with the directory information based on the WordPress website builder using the MySQL database management system. The author uses the Jupiter Notebook development environment, which allows data transformation using the Python programming language. The database contains 4 main tables: Estates, Information about owners, Sources and Objects (models and reconstructions). The author's expert assessment of the degree of preservation of the main house of each estate is of interest: from complete preservation (rating 5) to complete loss (rating 0). Based on the proposed preservation criteria, the author obtained a reasonable estimate of the number of each group: out of the total number of 629 registered estates, about 8% are in excellent condition today, 11% are in good condition, 63.5% are in satisfactory or unsatisfactory condition. In 26% of cases, the main house of the estate has been preserved in one form or another. It is significant that according to the information provided in the database, the largest number of estates were created in the second half of the XVIII century, while the earliest dates date back to the beginning of the XVI century. The database also allows you to identify the causes of the loss of manor complexes and approximate dates, as well as the geographical location of the estates in the Moscow region. From the point of view of the possibility of building a reliable reconstruction of estates, the most important information is the source base on the history and architecture of each estate complex. Currently, more than 500 documents are described in the corresponding database table. The identification of information about the owners of the estates required the enrichment of the available archival data with information from open sources. An important component of the author's methodology is working with data on the dynasties of owners, which is provided by a program in the Python language, as a result of which the author has formed an expanded table of owners of estates, including 3140 people. The author collects information about these personalities using web scraping technologies. In general, the research conducted by the author of the article is a significant contribution not only to the further study of the history of Russian estates and their virtual reconstruction, but also to the use of genealogical Internet resources to enrich the original source complex based on the original software solutions proposed by the author. The article is written in a good academic style, the relevance and scientific novelty of the work are beyond doubt. The research methodology and the results obtained by the author will certainly arouse the interest of the readership. The article can be recommended for publication in the journal "Historical Journal: scientific research".