Machine Learning Approaches to Malware Detection in RAM

doi:10.26565/2304-6201-2025-67-07

Yevhen Lanin V.N. Karazin Kharkiv National University, 6 Svobody sq., Kharkiv, Ukraine, 61022 https://orcid.org/0009-0003-2639-6218
Nina Bakumenko V.N. Kharkiv National University, 6 Svobody sq., Kharkiv, Ukraine, 61022 https://orcid.org/0000-0003-3496-7167

DOI: https://doi.org/10.26565/2304-6201-2025-67-07

Keywords: machine learning, memory dump analysis, malware detection, Random Forest, multi-class classification, pipeline, digital forensics, cybersecurity, Python

Abstract

Relevance. In the current context of constantly growing cyber threats, the problem of detecting malicious software that can operate covertly in RAM using fileless attack techniques has become particularly relevant. Traditional antivirus solutions based primarily on signature-based approaches prove ineffective against modern advanced persistent threats (APT) and new modified threats. This makes it essential to develop innovative approaches to malware detection based on behavioral pattern analysis in RAM using machine learning methods.

Goal. Development and testing of an automated malware detection system through RAM dump analysis using machine learning methods, as well as comparative evaluation of the effectiveness of various classification algorithms for multi-class threat type detection.

Research methods: comparative analysis of machine learning algorithms, static analysis of memory dumps, multi-class classification, experimental validation on the Obfuscated-MalMem2022 dataset containing over 58,000 records with 58 Windows process features. Models were evaluated using accuracy, precision, recall, and F1-score metrics with weighted averaging.

Results. A fully functional technological pipeline was created for automated processing and classification of RAM dumps, including modules for data preprocessing, feature engineering, machine learning, and results evaluation. A comparative analysis of 13 machine learning algorithms was conducted, including classical methods (Random Forest, Gradient Boosting, Decision Tree, k-NN, SVM) and neural network architectures (Wide & Deep Network, CNN). It was established that the Random Forest algorithm demonstrates the best results for the multi-class malware classification task with an accuracy of 85.49% and F1-score of 85.52% at a training time of 1.3 seconds. The developed system is implemented in Python using scikit-learn libraries (for classical ML models), TensorFlow/Keras (for neural networks), and pandas (for data processing).

Conclusions. The study confirmed the high effectiveness of classical machine learning methods, particularly ensemble algorithms, for malware detection in RAM dumps. The developed Random Forest-based model provides an optimal balance between classification accuracy (85.52% F1-score), training speed (1.3 s), and computational efficiency, demonstrating significant advantages over neural networks in this context. The developed system has high practical significance and can be integrated into forensic platforms, cybersecurity incident monitoring systems, and expert systems for automated threat detection and accelerated incident analysis. The research results confirm the feasibility of using machine learning methods to create defense systems against modern cyber threats that operate exclusively in RAM.

Downloads

Download data is not yet available.

Author Biographies

Yevhen Lanin, V.N. Karazin Kharkiv National University, 6 Svobody sq., Kharkiv, Ukraine, 61022

Master student of the Education and Research Institute of Computer Sciences and Artiﬁcial Intelligence

Nina Bakumenko, V.N. Kharkiv National University, 6 Svobody sq., Kharkiv, Ukraine, 61022

Ph.D, associate professor of the Department of Computer Systems and Robotics, Education and Research Institute of Computer Sciences and Artiﬁcial Intelligence

References

/