Pleshakova E.S., Gataullin S.T., Osipov A.V., Romanova E.V., Marun'ko A.S. —
Application of Thematic Modeling Methods in Text Topic Recognition Tasks to Detect Telephone Fraud
// Software systems and computational methods. – 2022. – ¹ 3.
– P. 14 - 27.
DOI: 10.7256/2454-0714.2022.3.38770
URL: https://en.e-notabene.ru/itmag/article_38770.html
Read the article
Abstract: The Internet has emerged as a powerful infrastructure for worldwide communication and human interaction. Some unethical use of this technology spam, phishing, trolls, cyberbullying, viruses caused problems in the development of mechanisms that guarantee affordable and safe opportunities for its use. Currently, many studies are being conducted to detect spam and phishing. The detection of telephone fraud has become critically important, as it entails huge losses. Machine learning and natural language processing algorithms are used to analyze a huge amount of text data.
Fraudsters are identified using text mining and can be implemented by analyzing the terms of a word or phrase. One of the difficult tasks is to divide this huge unstructured data into clusters. There are several thematic modeling models for these purposes. This article presents the application of these models, in particular LDA, LSI and NMF. A data set has been formed. A preliminary analysis of the data was carried out and signs were constructed for models in the task of recognizing the subject of the text. The approaches of keyword extraction in the tasks of text topic recognition are considered. The key concepts of these approaches are given. The disadvantages of these models are shown, and directions for improving text processing algorithms are proposed. The evaluation of the quality of the models was carried out. Improved models thanks to the selection of hyperparameters and changing the data preprocessing function.