Gusenko M. —
The use of regular expressions for decompiling static data
// Software systems and computational methods. – 2017. – ¹ 2.
– P. 1 - 13.
DOI: 10.7256/2454-0714.2017.2.22608
URL: https://en.e-notabene.ru/itmag/article_22608.html
Read the article
Abstract: The subject of the study is the process of decompiling the source code of programs into high-level languages. The author shows the decompilation point in the program transformation cycle which includes the processes of canonization, compilation, optimization, and decompilation. The object of the study is the compiled equivalent of the static data description on a high level programming language, which in general case is a nontrivial mapping of syntactic constructions on a high level programming language into a byte sequences located in executable program modules and constructed considering various optimization techniques for this microprocessor architecture. The paper reviews the static data decompilation process as reconstruction of the parse tree of the program, which is recovered during the analysis of its executable code and as a binary sequence in the memory of the von Neumann machine, which is analyzed by the regular expression created by the decompiler from the supposed description of the data. Regular expressions are traditionally used to analyze character sequences. The article presents another area of application of this tool – for proving the hypothesis that this byte array of the executable module is the equivalent of compiled static data. The author suggests a variant of the corresponding syntax of the regular expression language. The article shows that the proposed method can be used to further verify the quality of the decompiled code.