This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Back to contents

Software systems and computational methods

Reference:

Lyutikova L.A. Application of logical modeling for the analysis and classification of medical data for the purpose of diagnosis. // Software systems and computational methods. 2023. № 4. P. 61-72. DOI: 10.7256/2454-0714.2023.4.68876 EDN: KIUUOL URL: https://en.nbpublish.com/library_read_article.php?id=68876

Application of logical modeling for the analysis and classification of medical data for the purpose of diagnosis.

Lyutikova Larisa Adol'fovna

PhD in Physics and Mathematics

Head of department, Institute of Applied Mathematics and Automation

360000, Russia, respublika Kabardino-Balkariya, g. Nal'chik, ul. Shortanova, 89a

lylarisa@yandex.ru

Other publications by this author

DOI:

10.7256/2454-0714.2023.4.68876

EDN:

KIUUOL

Received:

03-11-2023

Published:

25-11-2023

Abstract: The subject of the research is a logical approach to data analysis and the development of software tools capable of identifying hidden patterns, even with a limited amount of data. The input data consists of indicators of the diagnosis of patients, their diagnoses and the experience of doctors obtained in the course of medical practice. The research method is the development of software tools based on systems of multivalued predicate logic for the analysis of patient data. This approach considers the source data as a set of general rules, among which it is possible to distinguish those rules that are sufficient to explain all the observed data. These rules, in turn, are generative for the area under consideration and help to better understand the nature of the objects under study. The novelty of the study lies in the use of multivalued logic to analyze a limited amount of medical data of patients in order to determine the most likely diagnosis with a given accuracy. The proposed approach makes it possible to detect hidden patterns in the symptoms and results of patient examinations, classify them and identify unique signs of various forms of gastritis. Unlike neural networks, logical analysis is transparent and does not require training on large amounts of data. The conclusions of the study show the possibility of such an approach for diagnosis with a lack of information, as well as the offer of alternatives if the required accuracy of diagnosis is not achieved.

Keywords:

diagnostics, communications, multivalued logic, data, analysis, hidden patterns, classifier, product rules, training, properties
This article is automatically translated.

Introduction

Medical diagnostics actively uses machine learning methods that allow you to effectively analyze data and support the diagnosis process.

One of the advantages is the ability to process large amounts of information and find hidden patterns that help predict diseases and make decisions. Unlike manual coding, algorithms independently find signs and build models based on training data.

Existing methods can be divided into such as the classification of patients by symptoms for the diagnosis of specific diseases. Clustering to identify subgroups of patients with similar features and individual selection of treatment. Prognosis of the course of the disease, risk assessment, determination of treatment. Analysis of medical images and signals using deep learning.

However, it should be borne in mind that when using machine learning methods in medical diagnostics tasks, various difficulties may arise. Thus, the quality of solutions can be significantly affected by the insufficient amount of qualitative data for training models, since many medical data are confidential and their collection requires a lot of effort. In some cases, there is a need to retrain the model when it begins to work better with the training sample, but does not generalize well to new data. Sometimes there is a need to constantly update the system as new knowledge and data become available. ^[1-4].

Taking into account the problems listed in this paper, it is proposed to use logical methods to build a diagnostic model, since models based on logical rules and constraints are more understandable and transparent for doctors/specialists compared to the "black boxes" of deep learning. Quality - logical methods allow you to take into account expert knowledge and draw conclusions from incomplete data, with neural networks a large amount of data is required. Logical models are easy to adjust and supplement as new knowledge is gained, unlike regression/MLP.

1. Materials and methods

The aim of the study is to create a machine learning model to support the diagnosis of gastritis based on available patient data.

Gastritis is a common disease that affects up to 20% of the adult population in developed countries, and there is also a high incidence among children. To accurately determine chronic gastritis, a histological examination of biopsies of the gastric mucosa is required, since this allows you to identify morphological changes, while the clinical picture may be uninformative.

In 1990, the Sydney Gastritis Classification System was adopted at the IX International Congress of Gastroenterologists. In 1996, the final version of the modified Sydney system was published, known as "Classification and Gradation of Gastritis. Modified Sydney System". According to this classification, there are three main categories of gastritis: acute, chronic and special forms.

For our study, we used the results of a histological examination of the gastrobiopsias of 132 patients, conducted in the period from 2019 to 2022 at the Pathology Bureau belonging to the State Healthcare Institution. These data represent a valuable source material for our research. They allow us to analyze and study various pathological conditions of the stomach, identify links between various factors and pathologies, and develop models and diagnostic algorithms based on these data.

Using the results of histological examination as the basis for a training sample provides us with the opportunity to create a reliable and informative model that can be applied in future research and clinical practice for more accurate and effective diagnosis of gastric pathologies. This will help to improve the diagnostic process and develop more effective methods of treating gastritis.

It is necessary to build an algorithm capable of classifying new patients based on these data and determining a probable diagnosis to help doctors in diagnosis.

Given the small amount of data, the logical approach of machine learning was chosen, meaning logic is the science of correct reasoning, regardless of its scope.

In this task, this approach has a number of advantages over other methods: easy interpretability and explainability of the models obtained, the ability to work effectively with small samples, integration of expert knowledge and rules, better work with noise and outliers in the data.

The model will be presented as a set of logical rules that allow determining the most likely diagnosis of gastritis based on symptoms and examination results. This will facilitate the work of doctors ^[5-6].

As an illustration, a fragment of a standardized form is shown in Figure 1, containing a list of symptoms and results of patient examinations that were taken into account when establishing the diagnosis of gastritis.

This form was used to record the clinical picture of a particular patient and is part of the initial data on the basis of which the proposed model will be trained. Filling out such a form by a doctor allows you to structure information about the patient and the symptoms of the disease.

Figure 1. Fragment of the questionnaire of input data

Mathematical formulation: it is necessary to find a function of 28 variables, which is defined at 132 points, the area of definition of each variable has a spread from 2 to 4 options. It is necessary to restore the function value at other requested points.

The solution is to extract patterns from these existing examples with an established diagnosis and build a model based on the analysis of data from past cases.

The model should take into account the characteristics of previous patients, the relationship of symptoms with the diagnosis, and based on this be able to classify new cases.

Thus, the problem is formulated as the extraction of knowledge from an existing set of previously solved similar problems in order to predict solutions for new problems of the same class.

And then, –a set of symptoms, diagnosed diseases. – possible diagnoses, each diagnosis is characterized by a corresponding set of symptoms.

This can be represented in the following form:

It should be borne in mind that an experienced specialist can provide information about the diagnosis of gastritis based on clinical thinking, which may be more complete and objective compared to automatic data analysis. Nevertheless, the formalized logical approach makes it possible to identify objective statistical patterns in the data and identify the most significant signs for diagnosis.

At the same time, it is necessary to take into account the valuable practical experience of doctors and their expert knowledge about diagnostic capabilities ^[7]. The best results can be achieved if you combine logical data analysis with the expertise of medical specialists. This will allow us to develop a model based on both statistical patterns and deep knowledge in the field. This will lead to a more accurate understanding of the process of diagnosis of gastritis ^[8].

2. Results

In the tasks of medical diagnostics, it is often necessary to work with incomplete and contradictory information. Logical analysis of the data allows us to identify both explicit and hidden patterns established statistically. This makes it possible to determine a minimally sufficient set of features to explain all the observed patterns.

Under the direct supervision of doctors, a histological examination map (CGI) was developed

KGI is a tool for the diagnosis and analysis of pathological conditions, developed jointly with pathologists. It consists of two parts:

The first part contains information about the patient, such as last name, initials, gender, age, as well as the date and number of the study. It also includes 28 diagnostic signs that are established during histological examination. These signs are organized in a certain sequence and represent a variant of the original diagnostic algorithm. The doctor conducting the study records the values of these signs in the CGI in strict sequence.

The second part, called "Diagnosis", contains the main target signs. The main target feature is the diagnosis itself, which can be one of three values: "norm", "chronic superficial gastritis (CGP)" or "chronic atrophic gastritis (CGA)". Depending on the chosen diagnosis, it is possible to include up to 9 additional signs, such as topography, etiology and activity.

CGI allows to systematize and standardize the process of diagnosis of histological samples and provides a unified approach to the assessment and classification of pathological conditions. It provides doctors with complete information about patients and their histological data, which contributes to a more accurate and reliable diagnosis. CGI can also be used in research and data analysis to identify links between various signs and pathological conditions.

Such a model will be more compact and reliable compared to the original data set. It also has greater reliability and processing speed.

A system of rules is considered complete if all available solutions can be reproduced on its basis.

It is convenient to designate a group of diagnoses separated by common signs or symptoms as a class.

The goal is to build such a model based on statistical analysis of available medical data.

Each diagnosis can be a representative of one or more classes, and each class is determined by a set of similar symptoms ^[9].

Logical analysis of data allows you to identify patterns in them and get a set of logical statements (rules) that fully describe these patterns.

This makes it possible to classify existing diagnoses and divide them into groups depending on the similarity of symptoms.

For each individual patient, a deterministic condition (rule) can be formulated that establishes the relationship between the presence of a certain combination of symptoms and a specific diagnosis.

This is a general rule of production, where the predicate takes the value true, i.e. , if and , if .

This rule can be written in another form:

A generalizing model for an existing data set can be constructed as a logical multiplication (conjunction) of all individual classification rules. That is, we will get a single logical expression that takes into account all possible variants of matching the combination of signs and diagnoses for all objects in the sample. This will allow us to describe in the form of a single function the relationship between signs and diagnoses for the entire set of data, combining all the particular rules into a holistic model. The conjunctive function obtained in this way will fully take into account the relationships inherent in the considered set of medical data.

The constructed model makes it possible to exclude insignificant signs and divide the data into classes according to diagnoses, while the same diagnosis can be characterized by different combinations of symptoms. As a result, a logical expression is obtained from m+n variables, where m is the number of patients, and n is the number of signs.

Such a model establishes rules for matching combinations of signs and diagnoses, except in cases where existing rules are denied. It will be true for all valid combinations of signs and diagnoses and is incorrect only if some combinations negate previously established dependencies. The model is easy to modify by adding new rules using logical multiplication. This allows you to take into account additional data or situations. Thus, a flexible system is obtained that can be adjusted and expanded ^[10].

This logical model can be represented as a recursive function. At the same time, each specific rule can cause other rules or auxiliary subfunctions to establish a final diagnosis. Such a hierarchical structure provides greater flexibility and efficiency of data processing, since it allows you to make decisions based on chains of conclusions. Thanks to the possibility of modification and extension of the function, as well as the use of a recursive format, it is possible to create adaptive diagnostic systems.

They are able to take into account various combinations of symptoms and make more accurate and informed decisions to identify the most likely diagnosis. This architecture provides flexibility and the ability to continuously improve the model when new data is received.

Where W(X) is the function being modeled, is the characteristic of objects at the current moment, is the state of the system at the current moment ^[12].

If a logical function is represented as a minimized disjunctive normal form (MDNF), it will allow a compact description of the data.In such a function, possible variants of diagnoses will be encrypted; classes of diagnoses that combine them based on the similarity of symptoms; combinations of signs that are not characteristic of the diagnoses under consideration.

The advantage of MDNF is a compact and unambiguous representation of knowledge. This will reduce the size of the model ^[11].

A fragment of the internal program representation is shown in Figure 2

Figure 2. Fragment of the program representation of the function

With large amounts of data, direct representation in the form of DNF can become cumbersome. Therefore, it is advisable to use the algorithm below.

3. Discussion

Data about each patient is presented in the form of a table, where the columns correspond to the questionnaire questions and possible answers, the rows correspond to patients and groups (classes) of patients and their diagnoses. The values of symptoms and examination results for each patient are recorded in the corresponding cells of the table. Diagnoses are encoded in numbers and are also placed in a table, in columns corresponding to the characteristics of this patient. K-digit logical predicates are used to describe the dependencies between symptoms and diagnoses. These predicates are formalized in the form of a system of productive rules for the relationship of data - symptoms to diagnoses.

Such structuring makes it possible to formalize knowledge about dependencies within the framework of the task of recognizing gastritis diagnoses.

Set of symptoms 1 ? Solution 1,

Set of symptoms 2 ? Solution 2,

...

Set of symptoms m ? Solution M.

It is important to note that the same solution can come from different sets of input data.

Firstly, the constructed system of productive rules (implicative statements) can be transformed into an optimal logical expression taking into account logical functions.

This allows you to remove redundant information and identify all possible equivalent classes of solutions. Thus, it is possible to identify hidden patterns in the data.

Program Description:

This program implements the algorithm described earlier and consists of two executable modules:

Module 1: Decryption of the database and analysis of the results.

This module performs database decryption using a dictionary.

Loads symptoms and diagnoses in the form of question-answer pairs.

Analyzes the results using the described algorithm.

Module 2: Creating the Knoyledge knowledge base.

This module of the program performs the functions of forming and improving the knowledge base. Based on the information from the source data, creates an initial version of the knowledge system. Provides clarification and replenishment of knowledge of an existing database, can optimize the amount of stored information depending on a given level of approximation

^[12,13].

To obtain the diagnostic result with the specified accuracy, it is necessary to fill in all the fields indicated in Figure 1.

If the specified accuracy is not possible, it will be displayed in the corresponding message, as shown in Figure 3.

Figure 3. Diagnostic result

Conclusion

In the course of the study, a software system was developed for the diagnosis of gastritis based on the logical analysis of medical data.

The proposed method of analysis makes it possible to identify hidden dependencies, classify data and highlight the unique features of each diagnosis. Unlike neural networks, the logical approach is more interpretable and does not require additional training.

Logical algorithms are an effective tool for data mining. They consider the initial information as a set of general patterns, from which a minimally sufficient set of rules can be identified to explain all observations. These rules also allow for a better understanding of the processes being studied.

The developed system can become a useful tool for gastroenterologists, providing informed decision-making based on logical understanding of the data and forming an integrative view of the diagnostic problem.

References

1. Zhuravljov, Ju. I. (1978). Ob algebraicheskom podhode k resheniju zadach raspoznavanija ili klassifikacii. Problemy kibernetiki, 33, 5–68.
2. Shibzukhov, Z.M. (2014). Correct Aggregation Operatios with Algorithms. Pattern Recognition and Image Analysis, 24(3), 377-382.
3. Naimi, A. I. & Balzer L.A. (2018). Multilevel generalization: an introduction to super learning. European Journal of Epidemiology, 33, 459-464.
4. Haoxiang, W. & Smith S. (2021). Big data analysis and perturbation using a data mining algorithm. Journal of Soft Computing Paradigm, 3–01, 19-28.
5. Joe, M. & Vijesh, J. S. (2021). User Recommendation System Dependent on Location-Based Orientation Context. Journal of Trends in Computer Science and Smart Technology, 3-01, 14-23.
6. Grabisch, M. & Marichal, J.L. & Pap, E. (2009). Aggregation functions. Cambridge University Press, 127,13-27.
7. Calvo, T. & Belyakov, G. (2010). Aggregating functions based on penalties. Fuzzy sets and systems, 10-161, 1420-1436.
8. Mesiar, R., Komornikova, M., Kolesarova, A. & Calvo, T. (2008). Fuzzy aggregation functions: a revision. Sets and their extensions: representation, aggregation and models. Berlin:Springer-Verlag.
9. Yang, F. & Yang, Zh. & Cohen W.W. (2017). Differentiable learning of logical rules for reasoning in the knowledge base. Advances in the field of neural information processing system, 3, 2320-2329.
10. Akhlakur, R. & Sumaira, T. (2014). Ensemble classifiers and their applications: a review. International Journal of Computer Trends and Technologies, 10, 31-35.
11. Lyutikova, L.A. & Shmatova, E.V. (2020). Algorithm for constructing logical operations to identify patterns in data. E3S Web of Conferences, 3, 217-222.

First Peer Review

Peer reviewers' evaluations remain confidential and are not disclosed to the public. Only external reviews, authorized for publication by the article's author(s), are made public. Typically, these final reviews are conducted after the manuscript's revision. Adhering to our double-blind review policy, the reviewer's identity is kept confidential.
The list of publisher reviewers can be found here.

This article is about the possibilities of using artificial intelligence for medical diagnostics. For some reason, the author uses the term "machine learning", which is now obsolete. But that's not the point. The fact is that such work is really needed and it is important. And the most important thing is that the introduction of artificial intelligence into the practice of medical diagnostics cannot be stopped, whether someone likes it or not. The process is underway and it is important to set priorities in this regard. The author writes that medical diagnostics actively uses machine learning methods that make it possible to effectively analyze data and support the diagnosis process. Yes, it is. It is proposed to use logical methods to build a diagnostic model, since models based on logical rules and constraints are more understandable and transparent for doctors/specialists compared to the "black boxes" of deep learning. In this place, the substitution of concepts is obvious. Speaking about the use of logical methods, the author means the logic of artificial intelligence. But there is also the concept of diagnostic logic as a process for determining the etiology and pathogenesis of the disease, differential diagnostic differences, individual characteristics of the pathological process, etc. This is medical logic. Therefore, the author in this article does not write about the logic that is taught in medical schools. This is a conflict. Therefore, all further presentation of the material in this text should be understood as the process of building a diagnostic machine algorithm, which may have some relation to medical diagnostics. Or maybe it doesn't. Such features relate to additional and auxiliary diagnostic tools. Considering them in the general diagnostic process, it is important for the doctor not to make a mistake. But there is still no data in the literature on the risks of using such machine diagnostic models. Everyone writes about the same thing, like the author of this article, that such models will only make the doctor's work easier, mistakenly believing that the doctor is very tired of diagnosis. This is not true. It's not the diagnosis that gets tired of the doctor. However, the aim of this study "is to create a machine learning model to support the diagnosis of gastritis based on available patient data." The author's idea is clear and one can agree in principle with such a formulation of the goal. Further in the text it is said that 17 types of gastritis have been identified. What are these 17 types? What are they for? Is this at the request of doctors or at the initiative of the model developer? Doctors do not need to impose any new types of gastritis and thus oblige them to rebuild the diagnostic process taking into account machine logic. It should be exactly the opposite. And the author already writes that "it is necessary to build an algorithm capable of classifying new patients based on these data and determining a probable diagnosis in order to help doctors in diagnosis." What kind of "this data" is it? Is it for these 17 types? And then, therefore, it is necessary to send doctors to teach knowledge of machine logic? This approach is not correct in principle. And this is because the author does not rely on any reasonable methodological principles of medical diagnosis. Diagnostic doctors should be helped, not hindered. The formulas given below are of no interest to the potential reader and the author cited them in vain in the text. They do not have any argumentative value. Articles with such content should be considered in specialized journals. But still, we must pay tribute to the author in the sense that he adheres to a correct understanding of the relationship between man and machine. Thus, the article notes that it is necessary to take into account that "an experienced specialist can provide information about the diagnosis of gastritis based on clinical thinking, which may be more complete and objective compared to automatic data analysis." Nevertheless, the formalized logical approach makes it possible to identify "objective statistical patterns in the data and identify the most significant signs for diagnosis. At the same time, it is necessary to take into account the valuable practical experience of doctors and their expert knowledge about diagnostic possibilities." This is how the author writes correctly, and this indicates that he will most likely be able to find mutual understanding with the medical community in matters not only of using machine models in the diagnostic process, but, more importantly, in the process of developing such models. In this sense, the prospect of a psychological understanding of the critical aspects of this process opens up. It is necessary to take into account the healthy conservatism of the medical diagnostics system and the need to introduce new models based on machine logic into it. All this will contribute to the rationalization of the diagnostic process by setting diagnostic priorities. It is impossible to say what this arrangement will be before generalizing the experience of using diagnostic machine algorithms. By the way, the reviewer does not exclude that in some cases the use of machine intelligence may well be a priority. Therefore, this article can be recommended for publication after finalizing the text and reworking the bibliographic list, including sources that are more understandable not only to doctors, but also to psychologists.

Second Peer Review

The subject of the research in the reviewed article is logical modeling and its application for the analysis and classification of medical data for the purpose of diagnosis. The methodology of the study is based on the processing of statistical data on the results of histological examination of gastrobiopsias of 132 patients conducted in the period from 2019 to 2022 in the Pathology Bureau owned by a Public health institution, the use of modeling and machine learning methods. The authors rightly attribute the relevance of the work to the fact that machine learning methods effectively analyze data and support the process of making a medical diagnosis, and the algorithms used independently find signs and build models based on training data. The scientific novelty of the reviewed study, according to the reviewer, consists in the machine learning model developed by the authors of the article to support the diagnosis of gastritis for the diagnosis of gastritis based on the logical analysis of medical data. The following sections are structurally highlighted in the article: Introduction, Materials and methods, Results, Discussion, Conclusion, Bibliography. The article describes the construction of an algorithm capable of classifying new patients according to the results of histological examination, used as the basis for a training sample and determining the diagnosis. The mathematical formulation of the problem boils down to finding a function of 28 variables, the domain of definition of each of which has a range from 2 to 4 options, at 132 points and restoring the values of the function at other requested points. The authors believe that logical analysis of the data makes it possible to identify patterns in them and obtain a set of logical statements that fully describe these patterns, which makes it possible to classify existing diagnoses and divide them into groups depending on the similarity of symptoms. The machine learning model allows you to exclude non-essential features and divide the data into classes by diagnosis. The program proposed by the authors includes two modules: "Decoding the database and analyzing the results", as well as "Creating a knowledge base". The text of the article is illustrated with three figures: "Fragment of the questionnaire of input data", "Fragment of the program representation of the function", "Diagnostic result", contains four formulas. In conclusion, it is concluded that Logical algorithms are an effective tool for data mining, and the developed system can become a useful tool for gastroenterologists, ensuring informed decision-making. The bibliographic list includes 11 sources – publications of foreign and domestic scientists on the topic of the article, to which there are targeted links in the text confirming the existence of an appeal to opponents. As a comment, it should be noted that there are typos in the text, for example, "and formations" in the final sentence of the article. The article reflects the results of the research conducted by the authors, corresponds to the direction of the journal "Software Systems and Computational Methods", contains elements of scientific novelty and practical significance, may arouse interest among readers, and is recommended for publication.

Journals

Books

Application of logical modeling for the analysis and classification of medical data for the purpose of diagnosis.