Translate this page:
Please select your language to translate the article


You can just close the window to don't translate
Library
Your profile

Back to contents

Historical informatics
Reference:

The use of data management technologies in the creation of historical Internet resources

Trishin Ivan

Master of History, Historical Information Science Department, Moscow State University 

301570, Russia, Tul'skaya oblast', pos. Volovo, ul. Aleksandrova, 4A, kv. 2

trishin_ivan@rambler.ru
Other publications by this author
 

 

DOI:

10.7256/2585-7797.2022.2.38334

EDN:

JTFMFO

Received:

26-06-2022


Published:

19-07-2022


Abstract: In this article, special attention is paid to the problem of preserving and distributing the results of virtual reconstructions of historical and cultural heritage objects by creating specialized electronic reference books based on the Wordpress website builder. Using the example of the handbook on virtual reconstructions of estates in the Moscow region, the author shows how, with the help of modern data management technologies, it is possible to configure the system in such a way that its filling and administration take place with the least amount of time on the part of the authors of publications and the directory administrator. The main problem solved by such a system is the elimination of the need for manual entry of each article and subsequent approval of the publication in manual mode.    The author of the study presents an approbation of the use of data management technologies in historical Internet resources. The systems presented in the article are used in industrial data management systems, but their availability allows them to be used in other fields of activity, including as an auxiliary tool in scientific research. Automatic transfer of data from the user's web form to a new directory page significantly speeds up the task of filling an Internet resource. The proposed version of the directory was created using the MySQL database, the Apache NiFi data orchestrator and the Wordpress website builder. All the listed tools are free and available for download on their official pages.


Keywords:

Virtual reconstructions, Cultural heritage, Historical Internet resources, Suburban estates, Databases, Wordpress, Apache NiFi, MySQL, Moscow oblast, Source studies

This article is automatically translated.

In recent decades, the global network space has begun to play a significant role in people's lives. Science is not lagging behind this trend, and therefore the number of electronic scientific publications is growing, the number of scientific and popular scientific resources is increasing, which allow researchers to find the necessary information and, less often, share their best practices with each other. Internet resources are increasingly being used as a platform for publishing the results of a research or amateur project, which makes these results available to a wide range of ordinary users, increasing interest in the electronic resources of such projects.

Such resources can be formed both spontaneously (by enthusiasts and amateurs) so it is in organized processes (most often within the framework of grant projects or other historical initiatives in which professional historians work).[1] Also, the second category of Internet resources includes thematic sites formed on the basis of special seminars[2] or special courses [3] of historical faculties. If the purpose of amateur sites is more often to attract an audience of users interested in general topics, then the resources of professional authors become both the publication of the results of a project (for example, a resource dedicated to the reconstruction of the White City of Moscow [4]) and an auxiliary resource for further study of a given issue (the project "Electronic Resources on the working history of Russia"[5]). Such resources should be formed according to certain rules that allow to fully support the historical topics presented with information.[1],[6] But most often historians depend on professional developers who form the appearance of the site and its interaction with the hosting system. At the same time, such resources, as a rule, contain information about a completed project, and therefore new data is rarely entered into the system. Nevertheless, the creation of large-scale Internet resources with hundreds or thousands of pages of various information with manual filling of such a large amount of data will require a huge number of man-hours not only for simple data entry, but also for page layout, and therefore work on such a project may be delayed.

This article describes part of a project dedicated to three-dimensional reconstructions of suburban estate complexes and carried out by the author of the proposed publication. Working on this project requires the orderly storage of a large amount of heterogeneous information: text files, images, three-dimensional models, visualization projects and other technical information must be competently distributed in the storage system so that the researcher always has quick access to the necessary materials. According to sources[7], there are more than six hundred former manor complexes in the Moscow region, mostly ruined or completely lost. A significant part of the objects from this list are described in various reference books, memoirs and notes, reconstruction projects have been drawn up for a number of estates, photographs and sketches of these complexes are stored in archives and museums. The total number of materials related to this topic cannot be counted, since their storage locations are scattered, and therefore work with each reconstruction requires a separate trip to museums and archives.

The designated computer reconstruction project contains four examples of restoring the lost appearance of manor complexes in a computer environment. The author of the study also came up with the idea of publishing the results of the work on a network resource that can contain both source materials and created virtual reconstruction projects in various formats. There are also a number of virtual reconstructions of suburban estates made by students of the Department of Historical Informatics of Lomonosov Moscow State University[8], the results of which are published in electronic journals in the form of articles, but the models themselves and the sources on the basis of which they were created remain with the authors of the works and are inaccessible to a wide range of researchers. Moreover, interest in such works may arise not only among historians specializing in virtual reconstructions, but also among professional architects, museologists, local historians and archaeologists interested in studying the history of estates in the Moscow region. For this reason, the disparity of the results of the reconstructions is another difficulty on the way to studying the Russian estate.

One of the goals of the described project is to publish the results in an open electronic directory of estate complexes of the Moscow region (the development of which will be discussed later), to involve researchers in filling this directory, as well as its support after the completion of the project. In the current article, we will analyze how the electronic directory system may look from the point of view of an ordinary user, a project participant and an administrator, and also describe the entire functional system on the basis of which the proposed resource is deployed.

Users' position: electronic directory of manor complexes with backed up resources

When entering the project's website, an ordinary user gets to the main page with the main records – about manor complexes, their owners, reconstructions and main sources. Each category belongs to the same category, each category is separated into a separate block. At the top of the page, the user can see new entries about the estates (Fig. 1), when scrolling down the page – the same blocks with the contents of other categories.

Fig. 1. The main page of the site with a filter under the heading "Estates" (screenshot of the author of the article)

Each heading has its own record format – a kind of form that unifies the content of articles. In addition to user convenience, a single format is necessary in the technical work of the site (more on this in the second part of the article). The menu on the user's left side contains a search bar where you can find any page or entry on the site, as well as a list of categories and recent entries without filters. In this way, the user goes to the page of the complex he is interested in, sees the basic information and the image of the estate, as well as links to all related records. The differences from the usual Internet resource in this case are minimal.

The user who has access to the content, in this case, acts a little differently. To fill out the form of the future web page, he needs to get a login and password from the administrator for a separate page with the form. In a letter from the administrator, the user receives a login and password and gets to a page with a selection of tables (Fig. 2). When clicking on the corresponding button, a form appears in the window to fill in.

Fig. 2. Table selection form (screenshot of the author of the article)

For example, when entering data about estate complexes, the user needs to enter the name of the complex, the time of its foundation, a description, a modern address, and also select a photo file or add a link to an image from open sources (Fig. 3). After that, the user sends data to the system and can continue to enter new information, or go to the website to view the received web page (fig. 4).

Fig. 3. The form for entering data about estate complexes (screenshot of the author of the article)

The data entered through the form, in the required format, gets to the web page and is placed in accordance with the specified template. If necessary, the user can notify the administrator about the incorrect publication of the page, after which the administrator will manually correct the template and configure the display of the record.

Fig. 4. Example of a web page filled out through the form (screenshot of the author of the article)

This is the end of the actions of the site users to upload information. They can use the contents of web pages and download the necessary files if the owner of the information gives permission to download. In the event that permission is not given, the data is only available for viewing.

Thus, the system allows users to add content without significant administrator intervention. Such an algorithm of actions is possible thanks to the use of modern open source data management technologies. Let's consider how this algorithm is implemented from a technical point of view.

Site administrator: data storage and routing

To host an Internet resource, hosting is required, which will provide the ability to store the file system and databases of the intended site. In this case, hosting was not used, since for development it is enough to install a local virtual machine (an operating system running on an emulator) and host the site on this system using a number of applications. A number of components are required to host the resource described above.

The core of the project is a Windows or Unix–class operating system, there will be no differences in the visible part of the resource. In this case, the Ubuntu Linux Unix system was chosen, since Linux systems have free versions (to which Ubuntu belongs), and Windows Server licenses are sold at high prices. The Linux environment is the first component of the necessary set of "LAMP" - Linux, Apache (web server for hosting the site), MySQL (database management system) and PHP (programming language for creating web applications).[9] There are a large number of articles about the installation of these programs, one of which was used to prepare the server for the installation of the site.[10]

The development of a web interface is possible using two scenarios: writing a platform from scratch and using a website builder. In the case of the resource in question, both options apply: the main directory is developed on Wordpress, an open source site management system, and the web form is written manually using Bootstrap styles (a special set of CSS styles used in most modern systems) and placed on a separate site. This scheme is necessary in case of failure of one of the components: if the form is disabled, the directory will continue to function, and vice versa. The use of Wordpress is also dictated by the presence of a REST API – an architecture of external interaction with the site, thanks to which it is possible to programmatically load pages.

The component responsible for routing data to the site interface has become Apache Nifi, a web–based orchestrator of streaming data capable of connecting to different information stores, upgrading and redirecting data from one point to another (ETL is the process of unloading (Extract), transforming (Transform) and loading (Load) data). The tool has a high level of flexibility, which allows the administrator to change the process settings without losing a significant amount of time. Nifi, like all the programs listed above, is distributed freely, which is an undoubted plus for low-budget projects.

 So, when sending data from a web form (clicking the "Send" button, Fig. 3) The web page sends form data via AJAX code (asynchronous communication process between the browser and the web server) to the MySQL database, which uses user authorization data when uploading information. An automatic record identifier (id) and the exact time of data loading into the system are added to the form fields (Fig. 5). The web form database is separated from the main Wordpress database, since manually changing the information in the latter can bring down the website completely.

Fig. 5. The first rows of the MySQL table with the data entered via the web form (screenshot of the author of the article)

Tables linked to web forms are constantly checked using the Apache Nifi orchestrator. The verification step is set to 5 minutes, but the system allows you to configure any update time to a second. The system consists of seven steps (Fig. 6).

Fig. 6. Apache Nifi process (screenshot of the author of the article)

At the first stage, the SQL query returns the record that goes first after the date of the last check (entered in a separate table). Next, the result is divided into attributes (the value of each field of the select query is entered into a variable), followed by a check for the presence of records in the query (the "RouteOnAttribute" step). If the request is empty (there are no new entries), the system stops working until the next check is started. If the record gets into work, a JSON structure is formed from the received attributes (a text file of the "key : value" format) for uploading to Wordpress, into which variable data from the SQL query is inserted (Fig. 7). At the "InvokeHTTP" stage, the REST request sends a JSON file to Wordpress, after which the last step of the Update-SQL query replaces the date of the last check with the date of loading of the entered record, after which the process repeats.

Fig. 7. JSON file generated in Apache Nifi (screenshot of the author of the work)

The Wordpress administrator sees the directory through the management console, where all information about the status of the site, a list of pages and records is displayed, as well as the appearance of the system is configured and user rights are regulated. In the "Records" section, the materials entered from the form appear, which are assigned the appropriate category depending on the table from which they were uploaded. If the administrator does not want the data to be automatically published on the site, then he needs to change the line in the Nifi JSON file from “status”: “publish” to “status”: “draft”, then the records will be uploaded to the system as drafts, and the decision on publication will be made by the administrator.

Thus, a number of freely distributed web applications that are used by many companies, starting with the creators of Internet blogs and ending with oil-producing corporations, can be used when creating historical Internet resources to speed up data loading processes and control the design of output material. Within the framework of the project of three-dimensional reconstructions of suburban estates, the author managed to significantly minimize the time of searching and embedding the necessary information into the research work with the help of the presented resource.

References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.

Peer Review

Peer reviewers' evaluations remain confidential and are not disclosed to the public. Only external reviews, authorized for publication by the article's author(s), are made public. Typically, these final reviews are conducted after the manuscript's revision. Adhering to our double-blind review policy, the reviewer's identity is kept confidential.
The list of publisher reviewers can be found here.

The subject of this article is the creation of electronic information resources dedicated to the results of various, mainly research projects. The author gives a description of the information resource project he is creating, dedicated to three-dimensional reconstructions of suburban estate complexes. The peculiarity of these reconstructions is the search and systematization of a large number of diverse sources. The peculiarity of the resource under consideration is the incompleteness of the project and the ordering of the storage of a large amount of heterogeneous and multiformat information. We are talking about more than 600 former manor complexes in the Moscow region, some of which are in a dilapidated state, and some are completely lost. As the project progresses, all information about it is published in an open information directory. The relevance of the article is determined, on the one hand, by the need to preserve and reconstruct (including on the basis of digital technologies) objects of historical and cultural heritage of great historical and architectural value, on the other hand, by the great interest of the scientific community, as well as a wide range of readers, modern virtual historical reconstructions. All of the above determines the scientific novelty of the project under consideration and the article written on the basis of its analysis. In fact, we are talking about a new kind of electronic resources that combine sources, technologies, methods and the results of their application to the processed material. All this is complemented by collective access to information input. The possible creation of an electronic reference book on materials from the Moscow region can become an example for the development of similar resources in other regions. The content of the article is quite traditional for scientific work of the applied genre. After setting the problem, goals and objectives of the study, the author proceeds to describe the specific project to which the article is devoted. A detailed description of the project site is given. Users interested in posting their information are given the opportunity to create their own part of the site and adjust it together with the project administrator. The project is running in a Linux environment. According to the author, the construction of the website he creates reduces the time of searching and embedding information in research work. The article is written in professional language and in a good scientific style. The article is characterized by a consistent unfolding of the author's judgments in accordance with the logic of scientific presentation, which ensures the reliability of the conclusions drawn. The bibliography of the article contains a sufficient number of references to scientific papers on similar research topics. It is not very large, but it includes a number of necessary articles and electronic resources. The bibliography is distinguished by its competent design. The review of scientific works by other researchers on similar topics, located at the beginning of the article, briefly but clearly explains the details of the considered scientific problem. The reviewed article fully corresponds to the format of the journal "Historical Informatics" and will arouse great interest from readers of different categories. The article is recommended for publication.