Gusenko M.Y. —
Creating a common notation of the x86 processor software interface for automated disassembler construction
// Software systems and computational methods. – 2024. – ¹ 2.
– P. 119 - 146.
DOI: 10.7256/2454-0714.2024.2.70951
URL: https://en.e-notabene.ru/itmag/article_70951.html
Read the article
Abstract: The subject of the study is the process of reverse engineering of programs in order to obtain their source code in low- or high-level languages for processors with x86 architecture, the software interface of which is developed by Intel and AMD. The object of the study is the technical specifications in the documentation produced by these companies. The intensity of updating documentation for processors is investigated and the need to develop technological approaches aimed at automated disassembler construction, taking into account regularly released and frequent updates of the processor software interface, is justified. The article presents a method for processing documentation in order to obtain a generalized, formalized and uniform specification of processor commands for further automated translation into the disassembler program code.
The article presents two main results: the first is an analysis of the various options for describing commands presented in the Intel and AMD documentation, and a concise reduction of these descriptions to a monotonous form of representation; the second is a comprehensive syntactic analysis of machine code description notations and the form of representation of each command in assembly language. This, taking into account some additional details of the description of the commands, for example, the permissible operating mode of the processor when executing the command, made it possible to create a generalized description of the command for translating the description into the disassembler code. The results of the study include the identification of a number of errors in both the documentation texts and in the operation of existing industrial disassemblers, built, as shown by the analysis of their implementation, using manual coding. The identification of such errors in the existing reverse engineering tools is an indirect result of the author's research.