Translate this page:
Please select your language to translate the article


You can just close the window to don't translate
Library
Your profile

Back to contents

Software systems and computational methods
Reference:

Markdown File Converter to LaTeX Document

Nuriev Marat Gumerovich

PhD in Technical Science

Senior Lecturer, Department of Automated Information Processing and Control Systems, Kazan National Research Technical University named after A.N. Tupolev-KAI

420015, Russia, Republic of Tatarstan, Kazan, Bolshaya Krasnaya str., 55

marat_nu1@mail.ru
Belashova Elena Semenovna

PhD in Physics and Mathematics

Associate Professor of the Computer Systems Department of Kazan National Research Technical University named after A.N. Tupolev-KAI

420015, Russia, Republic of Tatarstan, Kazan, Bolshaya Krasnaya str., 55

bel_lena@mail.ru
Barabash Konstantin Alekseevich

Student, Computer Systems Department, Kazan National Research Technical University named after A.N.Tupolev-KAI

420015, Russia, Republic of Tatarstan, Kazan, Bolshaya Krasnaya str., 55

kostyandriy@mail.ru
Other publications by this author
 

 

DOI:

10.7256/2454-0714.2023.1.39547

EDN:

SNAYLQ

Received:

25-12-2022


Published:

01-01-2023


Abstract: Common text editors such as Microsoft Word, Notepad++ and others are cumbersome. Despite their enormous functionality, they do not eliminate the risk of incorrectly converting the document, for example, when opening the same Word files on older or, conversely, newer versions of Microsoft Word. The way out is the use of markup languages, which allow you to mark up text blocks in order to present them in the desired style. Currently, very popular are LaTeX (a set of macro-extensions of the TeX typesetting system) and Markdown (a lightweight markup language, designed to denote formatting in plain text). So the question of converting a Markdown document into a LaTeX document is relevant. There are various tools to convert Markdown files to LaTeX document, such as Pandoc library, Markdown.lua, Lunamark and others. But most of them have redundant steps to generate the output document. This paper highlights a solution method by integrating a Markdown file into a LaTeX document, which will potentially reduce the output document generation time unlike existing solutions. The developed Markdown to LaTeX document converter will automatically generate the output document and reduce the possibility of errors when manually converting text from Markdown format to LaTeX format.


Keywords:

Markdown, LaTeX, programming, converter, Python, markup language, Overleaf, text conversion, Word, regular expressions

This article is automatically translated.

Introduction Familiar to the user text editors such as Microsoft Word, Notepad++ and others are "cumbersome".

With their huge functionality, they do not exclude the risk of incorrect document conversion, for example, when opening the same Word files on older or vice versa newer versions of Microsoft Word. The Microsoft text editor has two extensions: doc and docx. Although the first one is no longer relevant, however, older versions of Microsoft Word still use it. A doc file is a document format used by the Microsoft Word text editor, and .docx is the next version, it is more efficient, creates files that are less susceptible to corruption. For more reliable preservation of formats and revisions in the document, you can use the markup language [1]. The Markdown language is one of the representatives of markup languages. It has the following advantages:

  • Versatility: Documents with Markdown syntax are simple text files that can be opened in any text editor.Simplicity: The Markdown language is very easy to master, it does not require any additional knowledge.
  • A large selection of tools: Due to the high versatility with the Markdown markup language, you can work in any editor, the user's choice is almost unlimited.
  • Convertibility: Markdown documents can be easily exported to any formats: PDF, DOC, ODT.
  • At the same time, their formatting remains unchanged [2].Thanks to the use of markup languages, in particular, Markdown, it is possible to achieve high preservation of the file revision, which is due to the urgency of developing software for converting Markdown files into LaTeX documents.

A LaTeX document is a text file containing markup language commands.

 

Markdown simplified markup language The purpose of the Markdown language is to make it easy to write and read.

Outwardly, Markdown resembles the HTML language, but it is not and cannot replace it since it has very few types of syntax, the idea of Markdown is to simplify reading, writing and modifying documents. Unlike HTML, due to the huge number of different tags, it is difficult to read and understand what the result will look like, Markdown does not have this problem, because it is as easy to read as possible. To use Markdown, simply apply simple tags to the text. The Markdown markup language has block (Fig. 1) and lowercase elements (Fig. 2). As can be seen from the Markdown example, unlike HTML, it does not require large cascading selections to create a formatted paragraph. Figure 1. Example of block code in Markdown 

Figure 2.

Example of lowercase code in Markdown

 

Structure of LaTeX documents LaTeX is a set of macro extensions of the TeX computer layout system, which facilitates the collection of complex documents.

The basic idea of LaTeX is that authors need to think only about the content, without worrying about the final visual appearance. When developing his document, the author specifies the logical structure of the text (dividing it into chapters, sections, tables, images), and LaTeX solves the issues of its display. This is how the content is separated from the design. At the same time, the design is either determined in advance (standard), or developed for a specific document. A LaTeX document is much less readable than a Markdown document and has a rigid structure that is comparable to program code.

This document contains special markup language commands and is divided into a preamble and a body. The preamble contains information: about the class of the document, about the author, about the date of creation, and so on. The document body contains the document text and markup commands, it is limited to the begin{document} and end{document} commands.

 

Overleaf Web Editor The Overleaf web editor is an editor of LaTeX files, however, in addition to the format.

tex, which is a standard LaTeX document format, can read the .sty format, in which you can describe the style of the text, which allows you not to waste time editing the text, indentation, etc. – it is enough to describe the text by enclosing it in LaTeX document blocks. In addition to the above advantages, it has a user-friendly interface (Fig. 3). Figure 3. Overleaf Interface 

It also allows multiple users to edit the same file, supports almost all LaTeX functions and allows you to compile a document into .pdf format.

Overleaf is written in CoffeScript, uses Node.js and DBMS such as MangoDB and Redis [3].

 

Prerequisites for the development of the converter The development of the converter will automate the process of converting Markdown documents into LaTeX documents, which will eliminate or minimize the manual conversion procedure and avoid errors caused by the human factor.

 

The developed Markdown file converter to LaTeX document will be designed for:

  • Preparation of the selected document for conversion.
  • Providing the resulting file with the necessary libraries required for the correct display of all the functions of the source document.
  • Document conversion .md to .tex format.
  • Creating the resulting files, namely filtered .md and final .tex file. 

At the same time, the converter must process the selected file in a short period of time (no more than 30 seconds to convert a 500 KB document), have an interactive user interface. Block diagram of the Markdown file converter into a LaTeX document 

 

Based on the task, a block diagram of the Markdown file converter into a LaTeX document has been developed (Fig. 4). 

At the file input stage, the user selects the source file, which must be preprocessed before becoming part of the LaTeX document.

The main stage of creating a Markdown file converter into a LaTeX document is the file processing stage, in which the source file must be prepared for inclusion in the LaTeX template. It is a standard LaTeX document that includes all the necessary components. At this stage, the document will be processed by a program written in Python, after which the file will be prepared for output. The output will be made in the form of two files.

The first file is a Markdown document in which the text will be stored. The second file is a LaTeX template, which stores data about the markup of the text. Figure 4. Block diagram of document processing

 

The algorithm of the Markdown file converter to a LaTeX document The scheme of the algorithm of the Markdown file converter to the LaTeX document is shown in Figure 5.

Figure 5. The algorithm of the Markdown file converter to a LaTeX document The user selects a Markdown file and at the same time creates a LaTeX document that stores information about the style and libraries used to correctly display the file in the format.

tex. Tex components allow you to use the Russian language, denote character encoding, enable the correct display of the program code, and also make it color. Using the following keywords:

  • usepackage{packages/sleek-title}usepackage{packages/sleek-theorems}
  • usepackage{packages/sleek-listings}
  • the text style is set. 

At the stage of cleaning, objects are removed from the source file, the automatic addition of which to the LaTeX document is difficult or not possible, this is the bottleneck of the chosen method of solving the problem.

Figure 6 shows what the title looks like in a Markdown file, this text is not readable in a LaTeX document, the converter will give an error during processing. Figure 7 shows what the title looks like in a LaTeX document.

Figure 6. An example of a title page in a markdown file Figure 7. An example of a title page in a LaTeX document 

Inside the curly brackets of the keywords title {Title}, author{author} and date{date}, the corresponding title text, author's full name and publication date are entered, and the maketitle keyword in the body of the LaTeX document creates a title page.

However, the Markdown file header contains information that is read using the program and written to the beginning of the .md file. Further, the original Markdown file also needs to be cleared from the list of sources, since in the LaTeX document links are represented by a different language construct. Next, the Markdown file is added to the LaTeX template. Using the keyword markdownInput{ïðèìåð.md } the markdown file is attached to the LaTeX document. This keyword must follow the maketitle command, otherwise the title will be created after the main text [4]. Let's assume another way when the contents of the Markdown file are in the body of the LaTeX document instead of the markdownInput keyword{ïðèìåð.md }.

 

Software implementation of the Markdown file converter into a Latex document The software implementation consists of several stages.

As part of the first stage, useful information is extracted using the cleaning block and written to an intermediate file. The second stage consists in re-cleaning, which will fully prepare the document for conversion by eliminating the fragments that cannot be converted from the file. After processing by the cleaning module, the resulting file for it will be attached to the LaTeX template using the keyword MarkdownInput{file}, where the template file is created at the moment when the cleaned file is ready. The last stage of the work is the output of two resulting files: one is a LaTeX file that stores a link to the second Markdown file, as well as data on text markup and the necessary LaTeX libraries for correct operation and display of the result. Figure 8 shows a general block diagram of the software. Figure 8. Block diagram of the Markdown file converter software into a LaTeX document 

Thus, the software can be divided into several modules, such as:

  1. Cleaning module;
  2. LaTeX Template Creation Module;
  3. The output module of the result. The cleanup module opens the file and processes the text of the source file line by line using the while operator.

The first line of the processed file contains the value “---”, in Markdown encoding it creates a straight horizontal line inside a block surrounded by a pair of such values. This line contains an untranslatable LaTeX file header. Using the Python expression file = open ("èìÿ_ôàéëà.md ", "r", encoding='utf-8') open the file "èìÿ_ôàéëà.md ", for reading in UTF-8 encoding. Next, using the file.readline instruction, we read the first line of the file and shift the first line, having previously written its value to a variable. This is required in order to find the end of the header. Listing 1 shows the code of the program that performs the first stage of cleaning the file. Listing 1. The first stage of cleaning

 

As a result of executing this code (see listing 1), the file is cleared of the header, while the value of the header written in the source file, marked tittle, is written to the header variable. The search is performed by using a regular expression, which is a convenient tool when working with text. The text of the source file is written to the intermediate file starting from the next line after finding the second border of the header. This solution allows you to delete the header in 4 iterations, which increases the processing speed, since you do not need to examine every line of the file. The second stage of processing is required to delete the list of sources.

Here the filter_file file created in the first stage is processed. The task of this stage is to determine and delete the list of sources, since in the Markdown file it is formed using commands that are not translated into LaTeX, causing an error during conversion. Listing 2 shows the program code that solves the problem of the second stage of file cleanup. Listing 2.

The second stage of cleaning

 

Regular expressions are also used here. Using the sourse_list_check(line, line_index) function, which accepts the current line and its number, a search is performed according to a given pattern with the return of the line number if it matches. The line number is stored and the loop exits, then the data from the intermediate file is written to the resulting file before the line with the beginning of the list of sources. To do this, we subtract 1 from the value found by the function, otherwise the first line with the list of sources will be included in the file. As a result of performing two stages of cleaning, the resulting Markdown file is created.

 

LaTeX Template Creation Module To create a LaTeX template, an empty LaTeX document template was used, to which the required libraries were added.

Listing 3 shows a ready-made LaTeX template. Listing 3. LaTeX template

 

documentclass[12pt, letterpaper]{article} - defines the document type. Some additional parameters enclosed in parentheses and separated by commas can be passed to the command. In the example, additional parameters specify the font size (12pt), the default size is 10pt, and the paper size (letterpaper).

usepackage[utf8]{inputenc} is a document encoding that allows the use of non–ASCII characters in the text (for example, a, ?, ?...). It can be omitted or changed to another encoding, but it is recommended to use utf-8.

usepackage[english,russian]{babel} defines the default languages used, Russian is not available.

usepackage[T1]{fontenc} is the font encoding (determines which font is used.

usepackage[fencedCode,inlineFootnotes,citations,definitionLists,hashEnumer ators,smartellipses,hybrid]{markdown} – a set of libraries for Markdown required for the correct display of its functions.

fencedCode – used to display the code correctly.

hashEnumerators – used to create ordered lists.

inlineFootnotes – allows you to create links to external sites.

hybrid – allows you to use LaTeX code inside a Markdown document. Lines:

usepackage{packages/sleek-title}

usepackage{packages/sleek-theorems}

usepackage{packages/sleek-listings}

usepackage[noheader]{packages/sleek}

connect the appropriate template, the noheader parameter turns off the standard style header.

Inside the template files are described: text markup, output styles of equations, code, headers, plain text. The result is output after the cleaning block is completed, but it is not readable for the user, since it is encoded.

With the help of the overleaf service, the resulting file is read and converted to pdf format. This solution has a number of advantages, for example, high performance, the ability to edit a Markdown file after including it in a LaTeX document, which greatly increases the flexibility of the solution.

 

Conclusion Based on the results of the work done, a converter of Markdown files to a LaTeX document was developed, aimed at eliminating routine manual work on converting Markdown files to a LaTeX document. 

Tasks solved:

  1. The converter software project is presented;
  2. The algorithm of cleaning modules operation has been developed and implemented;
  3. A module for generating an output document with a specified LaTeX template has been developed. In the future, it is planned to develop this project by expanding the application with new output document styles, optimizing the code according to the performance criterion [5,6], adding new features such as batch file processing, integrating the program as a browser extension and providing access to the database with the resulting documents in compliance with the principles of information security [7,8].
References
1. Mehmonov I. N. Tools for automated formation of dynamic documents // Applied mathematics and informatics: Modern research in the field of natural and technical sciences. – 2020. – pp. 883-886.
2. Pavlov D. A. Automatic layout and design of scientific and program documentation // Computer tools in education. – 2018. – ¹. 6. – pp. 39-46.
3. B. Luo, W. Zhu, P. Li and Z. Han, "Distributed Dynamic Cuckoo Filter System Based on Redis Cluster," 2018 IEEE 4th International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing, (HPSC) and IEEE International Conference on Intelligent Data and Security (IDS), 2018, pp. 244-247, doi: 10.1109/BDS/HPSC/IDS18.2018.00059.
4. J. Tippayachai and S. Kiattisin, "Academic Publishing Solution Based on LATEX Class Package Implementation for ITMSOC Journal," 2018 3rd Technology Innovation Management and Engineering Science International Conference (TIMES-iCON), 2018, pp. 1-5, doi: 10.1109/TIMES-iCON.2018.8621689.
5. Gibadullin, R.F., Lekomtsev, D.V. & Perukhin, M.Y. Analysis of Industrial Network Parameters Using Neural Network Processing. Sci. Tech. Inf. Proc. 48, 446–451 (2021). https://doi.org/10.3103/S0147688221060046.
6. Gibadullin R.F. Flow-safe calls of controls in enriched client applications // Software Systems and Computational Methods. – 2022. – ¹ 4. – pp. 1-19. DOI: 10.7256/2454-0714.2022.4.39029 EDN: IAXOMA URL: https://nbpublish.com/library_read_article.php?id=39029.
7. Gibadullin R.F. Organization of secure data transmission in sensor network based on AVR microcontrollers // Cybernetics and Programming.-2018. – ¹ 6. – pp. 80-86. DOI: 10.25136/2306-4196.2018.6.24048 URL: https://nbpublish.com/library_read_article.php?id=24048.
8. Gibadullin R. F. F. Development of Uniform Formalism for Protection of Point, Linear and Area Objects in Cartography // Bulletin of Kazan State Technical University named after A.N. Tupolev. – 2010. – ¹. 2. – pp. 101-105.

Peer Review

Peer reviewers' evaluations remain confidential and are not disclosed to the public. Only external reviews, authorized for publication by the article's author(s), are made public. Typically, these final reviews are conducted after the manuscript's revision. Adhering to our double-blind review policy, the reviewer's identity is kept confidential.
The list of publisher reviewers can be found here.

The subject of the research is the development of a Markdown file converter into a LaTeX document. The research methodology is based on a combination of theoretical and empirical approaches using methods of analysis, generalization, comparison, synthesis, and programming. The relevance of the research is determined by the widespread use of information and communication technologies, the importance of designing and implementing appropriate software products. The scientific novelty is associated with the author's development of a software product (converter), which includes an algorithm for the operation of cleaning modules, as well as a module for generating an output document with a given LaTeX template. The article is written in Russian literary language. The style of presentation is scientific. The structure of the manuscript includes the following sections: Introduction ("bulky" text editors Microsoft Word, Notepad++, etc., the risk of incorrect document conversion, markup language, advantages of Markdown), Markdown simplified markup language (the purpose of the language, differences from HTML), the structure of LaTeX documents (the basic idea of LaTeX, a LaTeX document), Overleaf web editor (.sty format, user-friendly interface, file editing for several users), Prerequisites for the converter development (block diagram of the Markdown file converter to a LaTeX document, block diagram of document processing, algorithm of the Markdown file converter to a LaTeX document, an example of a title page in a Markdown file, an example of a title page A software implementation of the Markdown file converter to a Latex document (a block diagram of the software for converting Markdown files to a LaTeX document, cleaning modules, creating a LaTeX template, displaying the result), a module for creating a LaTeX template (text markup, styles for displaying equations, code, headers, plain text), Conclusion (conclusions), Bibliography. The text includes eight figures and three listings. Listings can also be indicated by drawings. The content generally corresponds to the title. The description of the Markdown file converter into a LaTeX document aimed at eliminating routine manual work is presented, and the prospects for the development of the project are identified (expansion with new styles, code optimization, increased performance, adding batch file processing, etc.). The bibliography includes eight sources of foreign authors (scientific articles). Bibliographic descriptions of some sources require adjustments in accordance with GOST and editorial requirements, for example: 3. Luo B., Zhu W., Li P., Han Z. Distributed Dynamic Cuckoo Filter System Based on Redis Cluster // IEEE 4th International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing, (HPSC) and IEEE International Conference on Intelligent Data and Security (IDS). Place of publication ???, 2018. P. 244-247. Excessive self-citation is possible (Gibadullin R. F. and co-authors). An appeal to opponents (Mekhmonov I. N., Pavlov D. A., Luo B., Zhu W., Li P., Han Z., Tippayachai J., Kiattisin S., etc.) takes place. In general, the material is of interest to the readership and, after revision, can be published in the journal "Software Systems and Computational Methods".