The spam-messages classification model in a medical information system

Keywords: spam messages, medical information systems, machine learning, natural language processing, text data classification

Abstract

Relevance. In modern medical information systems, a significant number of text records are generated daily from the service, doctors and staff. For high-quality work, such systems require the implementation of models and methods for analyzing and classifying text data, in particular, detecting spam messages and blocking them. Therefore, the development, improvement and implementation of models and methods for classifying spam messages is a relevant task.

Research objective: increasing the efficiency of the spam message recognition process in medical information systems; developing and implementing spam classification models based on machine learning methods.

Research methods: natural language processing methods, modeling, machine learning, classification methods, data analysis methods, statistical methods.

Results. Spam message classification models were built using such machine learning methods as the logistic regression model, the national Bayesian classifier model and the support vector model. The SMS Spam Collection set, previously prepared using CountVectorizer and TF-IDFVectorizer, was used to train the models. All proposed models showed high accuracy in spam message classification and the ability to correctly determine the type of message.

Conclusions: The developed message classification models based on machine learning and nlp approach successfully generated unwanted messages. The best model for quality indicators was the model based on the support vector method with TF-IDF vectorization, after which it showed the highest accuracy value (98.75%) and high value of recall (90.3%) of classification. Further improvements of the models and expansion of the training set can contribute to further improvement of the quality of spam recognition.

Downloads

Download data is not yet available.

Author Biographies

Kateryna Volynets, V.N. Karazin Kharkiv National University, 6 Svobody sq., Kharkiv, Ukraine, 61022

student of Education and Research Institute of Computer Sciences and Artificial Intelligence

Viktoriia Strilets, V.N. Karazin Kharkiv National University, 6 Svobody sq., Kharkiv, Ukraine, 61022

Ph.D, associate professor of the Department of Computer Systems and Robotics, Education and Research Institute of Computer Sciences and Artificial Intelligence

Danylo Yakovlev, V.N. Karazin Kharkiv National University, 6 Svobody sq., Kharkiv, Ukraine, 61022

student of Education and Research Institute of Computer Sciences and Artificial Intelligence

References

/

References

Published
2024-11-25
How to Cite
Volynets, K., Strilets, V., & Yakovlev, D. (2024). The spam-messages classification model in a medical information system. Bulletin of V.N. Karazin Kharkiv National University, Series «Mathematical Modeling. Information Technology. Automated Control Systems», 64, 25-31. https://doi.org/10.26565/2304-6201-2024-64-03
Section
Статті