Thorvaldsen G. —
Automating Historical Source Transcription with Record Linkage Techniques. Work in progress on the 1950 census for Norway
// Historical informatics. – 2018. – ¹ 1.
– P. 94 - 103.
DOI: 10.7256/2585-7797.2018.1.25686
URL: https://en.e-notabene.ru/istinf/article_25686.html
Read the article
Abstract: The article addresses the issue of transcribing handwritten materials of the 1950 Norwegian Population Census. These are 801 000 scanned double sided questionnaires. Optical character recognition programs have been improving for over four decades. Now researchers aim to extend similar techniques to handle handwritten historical source material. The article analyzes studies carried by the Center of Historical Documents at the University of Tromsø which address handwritten text recognition as well as considers the use of various text recognition techniques as far as nominative sources are concerned. Since it is difficult to distinguish and separate individual handwritten characters, the words are mathematically clustered according to image similarity or searched for within sources that have been transcribed earlier. After the recognition quality control, the software uses the line numbers to place the information taken from the transcribed cells. After that the latter become a part of the census database. Moreover, special software has been developed to process handwritten numerical codes, data on occupations and education, etc. The methods offered in the article provide for handwritten texts transcribing quality improvement and can be used to recognize nominative source notes in Russia, for instance, parish registers and vital records. The main goals are still the search for methods and algorithms which optimally link different variables as well as the rationalization of interactive proofread methods.