Library
|
Your profile |
Software systems and computational methods
Reference:
Lyutikova L.A.
Application of logical modeling for the analysis and classification of medical data for the purpose of diagnosis.
// Software systems and computational methods.
2023. ¹ 4.
P. 61-72.
DOI: 10.7256/2454-0714.2023.4.68876 EDN: KIUUOL URL: https://en.nbpublish.com/library_read_article.php?id=68876
Application of logical modeling for the analysis and classification of medical data for the purpose of diagnosis.
DOI: 10.7256/2454-0714.2023.4.68876EDN: KIUUOLReceived: 03-11-2023Published: 25-11-2023Abstract: The subject of the research is a logical approach to data analysis and the development of software tools capable of identifying hidden patterns, even with a limited amount of data. The input data consists of indicators of the diagnosis of patients, their diagnoses and the experience of doctors obtained in the course of medical practice. The research method is the development of software tools based on systems of multivalued predicate logic for the analysis of patient data. This approach considers the source data as a set of general rules, among which it is possible to distinguish those rules that are sufficient to explain all the observed data. These rules, in turn, are generative for the area under consideration and help to better understand the nature of the objects under study. The novelty of the study lies in the use of multivalued logic to analyze a limited amount of medical data of patients in order to determine the most likely diagnosis with a given accuracy. The proposed approach makes it possible to detect hidden patterns in the symptoms and results of patient examinations, classify them and identify unique signs of various forms of gastritis. Unlike neural networks, logical analysis is transparent and does not require training on large amounts of data. The conclusions of the study show the possibility of such an approach for diagnosis with a lack of information, as well as the offer of alternatives if the required accuracy of diagnosis is not achieved. Keywords: diagnostics, communications, multivalued logic, data, analysis, hidden patterns, classifier, product rules, training, propertiesThis article is automatically translated.
Introduction Medical diagnostics actively uses machine learning methods that allow you to effectively analyze data and support the diagnosis process. One of the advantages is the ability to process large amounts of information and find hidden patterns that help predict diseases and make decisions. Unlike manual coding, algorithms independently find signs and build models based on training data. Existing methods can be divided into such as the classification of patients by symptoms for the diagnosis of specific diseases. Clustering to identify subgroups of patients with similar features and individual selection of treatment. Prognosis of the course of the disease, risk assessment, determination of treatment. Analysis of medical images and signals using deep learning. However, it should be borne in mind that when using machine learning methods in medical diagnostics tasks, various difficulties may arise. Thus, the quality of solutions can be significantly affected by the insufficient amount of qualitative data for training models, since many medical data are confidential and their collection requires a lot of effort. In some cases, there is a need to retrain the model when it begins to work better with the training sample, but does not generalize well to new data. Sometimes there is a need to constantly update the system as new knowledge and data become available. [1-4]. Taking into account the problems listed in this paper, it is proposed to use logical methods to build a diagnostic model, since models based on logical rules and constraints are more understandable and transparent for doctors/specialists compared to the "black boxes" of deep learning. Quality - logical methods allow you to take into account expert knowledge and draw conclusions from incomplete data, with neural networks a large amount of data is required. Logical models are easy to adjust and supplement as new knowledge is gained, unlike regression/MLP.
1. Materials and methods The aim of the study is to create a machine learning model to support the diagnosis of gastritis based on available patient data. Gastritis is a common disease that affects up to 20% of the adult population in developed countries, and there is also a high incidence among children. To accurately determine chronic gastritis, a histological examination of biopsies of the gastric mucosa is required, since this allows you to identify morphological changes, while the clinical picture may be uninformative. In 1990, the Sydney Gastritis Classification System was adopted at the IX International Congress of Gastroenterologists. In 1996, the final version of the modified Sydney system was published, known as "Classification and Gradation of Gastritis. Modified Sydney System". According to this classification, there are three main categories of gastritis: acute, chronic and special forms. For our study, we used the results of a histological examination of the gastrobiopsias of 132 patients, conducted in the period from 2019 to 2022 at the Pathology Bureau belonging to the State Healthcare Institution. These data represent a valuable source material for our research. They allow us to analyze and study various pathological conditions of the stomach, identify links between various factors and pathologies, and develop models and diagnostic algorithms based on these data. Using the results of histological examination as the basis for a training sample provides us with the opportunity to create a reliable and informative model that can be applied in future research and clinical practice for more accurate and effective diagnosis of gastric pathologies. This will help to improve the diagnostic process and develop more effective methods of treating gastritis. It is necessary to build an algorithm capable of classifying new patients based on these data and determining a probable diagnosis to help doctors in diagnosis. Given the small amount of data, the logical approach of machine learning was chosen, meaning logic is the science of correct reasoning, regardless of its scope. In this task, this approach has a number of advantages over other methods: easy interpretability and explainability of the models obtained, the ability to work effectively with small samples, integration of expert knowledge and rules, better work with noise and outliers in the data. The model will be presented as a set of logical rules that allow determining the most likely diagnosis of gastritis based on symptoms and examination results. This will facilitate the work of doctors [5-6]. As an illustration, a fragment of a standardized form is shown in Figure 1, containing a list of symptoms and results of patient examinations that were taken into account when establishing the diagnosis of gastritis. This form was used to record the clinical picture of a particular patient and is part of the initial data on the basis of which the proposed model will be trained. Filling out such a form by a doctor allows you to structure information about the patient and the symptoms of the disease. Figure 1. Fragment of the questionnaire of input data Mathematical formulation: it is necessary to find a function of 28 variables, which is defined at 132 points, the area of definition of each variable has a spread from 2 to 4 options. It is necessary to restore the function value at other requested points. The solution is to extract patterns from these existing examples with an established diagnosis and build a model based on the analysis of data from past cases. The model should take into account the characteristics of previous patients, the relationship of symptoms with the diagnosis, and based on this be able to classify new cases. Thus, the problem is formulated as the extraction of knowledge from an existing set of previously solved similar problems in order to predict solutions for new problems of the same class. And then, –a set of symptoms, diagnosed diseases. – possible diagnoses, each diagnosis is characterized by a corresponding set of symptoms. This can be represented in the following form:
It should be borne in mind that an experienced specialist can provide information about the diagnosis of gastritis based on clinical thinking, which may be more complete and objective compared to automatic data analysis. Nevertheless, the formalized logical approach makes it possible to identify objective statistical patterns in the data and identify the most significant signs for diagnosis. At the same time, it is necessary to take into account the valuable practical experience of doctors and their expert knowledge about diagnostic capabilities [7]. The best results can be achieved if you combine logical data analysis with the expertise of medical specialists. This will allow us to develop a model based on both statistical patterns and deep knowledge in the field. This will lead to a more accurate understanding of the process of diagnosis of gastritis [8]. 2. Results In the tasks of medical diagnostics, it is often necessary to work with incomplete and contradictory information. Logical analysis of the data allows us to identify both explicit and hidden patterns established statistically. This makes it possible to determine a minimally sufficient set of features to explain all the observed patterns. Under the direct supervision of doctors, a histological examination map (CGI) was developed KGI is a tool for the diagnosis and analysis of pathological conditions, developed jointly with pathologists. It consists of two parts: The first part contains information about the patient, such as last name, initials, gender, age, as well as the date and number of the study. It also includes 28 diagnostic signs that are established during histological examination. These signs are organized in a certain sequence and represent a variant of the original diagnostic algorithm. The doctor conducting the study records the values of these signs in the CGI in strict sequence. The second part, called "Diagnosis", contains the main target signs. The main target feature is the diagnosis itself, which can be one of three values: "norm", "chronic superficial gastritis (CGP)" or "chronic atrophic gastritis (CGA)". Depending on the chosen diagnosis, it is possible to include up to 9 additional signs, such as topography, etiology and activity. CGI allows to systematize and standardize the process of diagnosis of histological samples and provides a unified approach to the assessment and classification of pathological conditions. It provides doctors with complete information about patients and their histological data, which contributes to a more accurate and reliable diagnosis. CGI can also be used in research and data analysis to identify links between various signs and pathological conditions. Such a model will be more compact and reliable compared to the original data set. It also has greater reliability and processing speed. A system of rules is considered complete if all available solutions can be reproduced on its basis. It is convenient to designate a group of diagnoses separated by common signs or symptoms as a class. The goal is to build such a model based on statistical analysis of available medical data. Each diagnosis can be a representative of one or more classes, and each class is determined by a set of similar symptoms [9]. Logical analysis of data allows you to identify patterns in them and get a set of logical statements (rules) that fully describe these patterns. This makes it possible to classify existing diagnoses and divide them into groups depending on the similarity of symptoms. For each individual patient, a deterministic condition (rule) can be formulated that establishes the relationship between the presence of a certain combination of symptoms and a specific diagnosis. . This is a general rule of production, where the predicate takes the value true, i.e. , if and , if . This rule can be written in another form:
A generalizing model for an existing data set can be constructed as a logical multiplication (conjunction) of all individual classification rules. That is, we will get a single logical expression that takes into account all possible variants of matching the combination of signs and diagnoses for all objects in the sample. This will allow us to describe in the form of a single function the relationship between signs and diagnoses for the entire set of data, combining all the particular rules into a holistic model. The conjunctive function obtained in this way will fully take into account the relationships inherent in the considered set of medical data.
The constructed model makes it possible to exclude insignificant signs and divide the data into classes according to diagnoses, while the same diagnosis can be characterized by different combinations of symptoms. As a result, a logical expression is obtained from m+n variables, where m is the number of patients, and n is the number of signs. Such a model establishes rules for matching combinations of signs and diagnoses, except in cases where existing rules are denied. It will be true for all valid combinations of signs and diagnoses and is incorrect only if some combinations negate previously established dependencies. The model is easy to modify by adding new rules using logical multiplication. This allows you to take into account additional data or situations. Thus, a flexible system is obtained that can be adjusted and expanded [10]. This logical model can be represented as a recursive function. At the same time, each specific rule can cause other rules or auxiliary subfunctions to establish a final diagnosis. Such a hierarchical structure provides greater flexibility and efficiency of data processing, since it allows you to make decisions based on chains of conclusions. Thanks to the possibility of modification and extension of the function, as well as the use of a recursive format, it is possible to create adaptive diagnostic systems. They are able to take into account various combinations of symptoms and make more accurate and informed decisions to identify the most likely diagnosis. This architecture provides flexibility and the ability to continuously improve the model when new data is received. Where W(X) is the function being modeled, is the characteristic of objects at the current moment, is the state of the system at the current moment [12]. If a logical function is represented as a minimized disjunctive normal form (MDNF), it will allow a compact description of the data.In such a function, possible variants of diagnoses will be encrypted; classes of diagnoses that combine them based on the similarity of symptoms; combinations of signs that are not characteristic of the diagnoses under consideration. The advantage of MDNF is a compact and unambiguous representation of knowledge. This will reduce the size of the model [11]. A fragment of the internal program representation is shown in Figure 2 Figure 2. Fragment of the program representation of the function With large amounts of data, direct representation in the form of DNF can become cumbersome. Therefore, it is advisable to use the algorithm below.
3. Discussion Data about each patient is presented in the form of a table, where the columns correspond to the questionnaire questions and possible answers, the rows correspond to patients and groups (classes) of patients and their diagnoses. The values of symptoms and examination results for each patient are recorded in the corresponding cells of the table. Diagnoses are encoded in numbers and are also placed in a table, in columns corresponding to the characteristics of this patient. K-digit logical predicates are used to describe the dependencies between symptoms and diagnoses. These predicates are formalized in the form of a system of productive rules for the relationship of data - symptoms to diagnoses. Such structuring makes it possible to formalize knowledge about dependencies within the framework of the task of recognizing gastritis diagnoses. Set of symptoms 1 ? Solution 1, Set of symptoms 2 ? Solution 2, ... Set of symptoms m ? Solution M. It is important to note that the same solution can come from different sets of input data. Firstly, the constructed system of productive rules (implicative statements) can be transformed into an optimal logical expression taking into account logical functions. This allows you to remove redundant information and identify all possible equivalent classes of solutions. Thus, it is possible to identify hidden patterns in the data. Program Description: This program implements the algorithm described earlier and consists of two executable modules: Module 1: Decryption of the database and analysis of the results. This module performs database decryption using a dictionary. Loads symptoms and diagnoses in the form of question-answer pairs. Analyzes the results using the described algorithm. Module 2: Creating the Knoyledge knowledge base. This module of the program performs the functions of forming and improving the knowledge base. Based on the information from the source data, creates an initial version of the knowledge system. Provides clarification and replenishment of knowledge of an existing database, can optimize the amount of stored information depending on a given level of approximation To obtain the diagnostic result with the specified accuracy, it is necessary to fill in all the fields indicated in Figure 1. If the specified accuracy is not possible, it will be displayed in the corresponding message, as shown in Figure 3. Figure 3. Diagnostic result Conclusion In the course of the study, a software system was developed for the diagnosis of gastritis based on the logical analysis of medical data. The proposed method of analysis makes it possible to identify hidden dependencies, classify data and highlight the unique features of each diagnosis. Unlike neural networks, the logical approach is more interpretable and does not require additional training. Logical algorithms are an effective tool for data mining. They consider the initial information as a set of general patterns, from which a minimally sufficient set of rules can be identified to explain all observations. These rules also allow for a better understanding of the processes being studied. The developed system can become a useful tool for gastroenterologists, providing informed decision-making based on logical understanding of the data and forming an integrative view of the diagnostic problem. References
1. Zhuravljov, Ju. I. (1978). Ob algebraicheskom podhode k resheniju zadach raspoznavanija ili klassifikacii. Problemy kibernetiki, 33, 5–68.
2. Shibzukhov, Z.M. (2014). Correct Aggregation Operatios with Algorithms. Pattern Recognition and Image Analysis, 24(3), 377-382. 3. Naimi, A. I. & Balzer L.A. (2018). Multilevel generalization: an introduction to super learning. European Journal of Epidemiology, 33, 459-464. 4. Haoxiang, W. & Smith S. (2021). Big data analysis and perturbation using a data mining algorithm. Journal of Soft Computing Paradigm, 3–01, 19-28. 5. Joe, M. & Vijesh, J. S. (2021). User Recommendation System Dependent on Location-Based Orientation Context. Journal of Trends in Computer Science and Smart Technology, 3-01, 14-23. 6. Grabisch, M. & Marichal, J.L. & Pap, E. (2009). Aggregation functions. Cambridge University Press, 127,13-27. 7. Calvo, T. & Belyakov, G. (2010). Aggregating functions based on penalties. Fuzzy sets and systems, 10-161, 1420-1436. 8. Mesiar, R., Komornikova, M., Kolesarova, A. & Calvo, T. (2008). Fuzzy aggregation functions: a revision. Sets and their extensions: representation, aggregation and models. Berlin:Springer-Verlag. 9. Yang, F. & Yang, Zh. & Cohen W.W. (2017). Differentiable learning of logical rules for reasoning in the knowledge base. Advances in the field of neural information processing system, 3, 2320-2329. 10. Akhlakur, R. & Sumaira, T. (2014). Ensemble classifiers and their applications: a review. International Journal of Computer Trends and Technologies, 10, 31-35. 11. Lyutikova, L.A. & Shmatova, E.V. (2020). Algorithm for constructing logical operations to identify patterns in data. E3S Web of Conferences, 3, 217-222.
First Peer Review
Peer reviewers' evaluations remain confidential and are not disclosed to the public. Only external reviews, authorized for publication by the article's author(s), are made public. Typically, these final reviews are conducted after the manuscript's revision. Adhering to our double-blind review policy, the reviewer's identity is kept confidential.
Second Peer Review
Peer reviewers' evaluations remain confidential and are not disclosed to the public. Only external reviews, authorized for publication by the article's author(s), are made public. Typically, these final reviews are conducted after the manuscript's revision. Adhering to our double-blind review policy, the reviewer's identity is kept confidential.
|