Anomaly detection methods in sample datasets when managing processes in systems by the state

Keywords: outlier detection, machine learning, process control, quality assessment metrics, deep learning

Abstract

The current information software does not allow solving the problems of detecting outliers in data samples and time series with a sufficiently high level of reliability.

Therefore, this work is devoted to the choice of metrics for assessing the correctness of detecting outliers, as well as the best mathematical models and methods for solving the problem of detecting outliers in test samples when managing processes in systems by state. Mathematical models and methods for detecting outliers (anomalous values) and Python-based software tools such as scikit-learn, Tensorflow, NumPy, Pandas and others have been used.

 The results of our work are the overview of the metrics used to assess the effectiveness of mathematical models and methods for detecting outliers; the overview of traditional and deep learning techniques of detecting outliers; the results of researching the efficiency and quality of mathematical models and methods for detecting outliers using 12 datasets; the conclusions about the best metric and the best mathematical models and methods for solving the problem of detecting outliers in test samples when managing processes in systems by state.

The selected methods are mainly used for monitoring the level of anomalous values in various datasets when managing processes in systems by state, which makes these methods universal.

Downloads

References

/

References

V.P. Shkodyrev, K.I. Yafagorov, В.А. Bashtovenko, Y.E. Ilyina. Review of methods for detecting anomalies in data streams.URL: http://ceur-ws.org/Vol-1864/paper_33.pdf (Last accessed: 10. 11. 2021). [in Russian]

M.V. Lomonosov. Detection of anomalies in the work of mechanisms by machine learning methods.URL: http://ceur-ws.org/Vol-2022/paper59.pdf (Last accessed: 10. 11. 2021). [in Russian]

Chalapathy R., Chawla S. Deep Learning for Anomaly Detection: A Survey. URL: https://arxiv.org/abs/1901.03407 (Last accessed: 10. 11. 2021)

Srikanth Thudumu, Philip Branch, Jiong Jin & Jugdutt (Jack) Singh. A comprehensive survey of anomaly detection techniques for high dimensional big data.

Deep Learning for Anomaly Detection: A Review: ACM Computing Surveys: Vol 54, No 2. URL: https://dl.acm.org/doi/10.1145/3439950 (Last accessed: 10. 11. 2021).

Muruti G., Rahim F., bin Ibrahim Z. A Survey on Anomalies Detection Techniques and Measurement Methods // 2018 IEEE Conference on Application, Information and Network Security (AINS). 2018.

Shikha Agrawal, JitendraAgrawal. Survey on Anomaly Detection using Data Mining Techniques.

Pang G. Deep Learning for Anomaly Detection // ACM Computing Surveys. 2021. Т. 54. № 2. С. 1-38.

Nassif A. Machine Learning for Anomaly Detection: A Systematic Review // IEEE Access. 2021. Т. 9. С. 78658-78700.

Izhak Golan, Ran El-Yaniv. Deep Anomaly Detection Using Geometric Transformations. URL: https://proceedings.neurips.cc/paper/2018/file/5e62d03aec0d17facfc5355dd90d441c-Paper.pdf (Last accessed: 10. 11. 2021).

Mohammad Braei, Sebastian Wagner. Anomaly Detection in Univariate Time-series: A Survey on the State-of-the-Art. URL: https://www.semanticscholar.org/paper/Anomaly-Detection-in-Univariate-Time-series%3A-A-on-Braei-Wagner/cf45bce52cca1f6e450ddaa1d19fe6e30661dffb (Last accessed: 10. 11. 2021).

Atiq ur Rehman & Samir Brahim Belhaouari. Unsupervised outlier detection in multidimensional data

Victoria J. Hodge and Jim Austin. A Survey of Outlier Detection Methodologies. URL: https://core.ac.uk/download/pdf/58585.pdf (Last accessed: 12. 11. 2021).

Karanjit Singh and Dr. Shuchita Upadhyaya. Outlier Detection: Applications And Techniques. URL:https://www.researchgate.net/publication/267964435_Outlier_Detection_Applications_And_Techniques (Last accessed: 12. 11. 2021).

Wang S. и др. Effective End-to-end Unsupervised Outlier Detection via Inlier Priority of Discriminative Network. URL: https://proceedings.neurips.cc/paper/2019/hash/6c4bb406b3e7cd5447f7a76fd7008806-Abstract.html (Last accessed: 12. 11. 2021).

Karanjit Singh and Dr. Shuchita Upadhyaya. Outlier Detection: Applications And Techniques. URL: https://www.researchgate.net/publication/228686398_k-Nearest_neighbour_classifiers (Last accessed: 14. 11. 2021).

Yen-Chang Hsu, Yilin Shen, Hongxia Jin, Zsolt Kira. Generalized ODIN: Detectiong Out-of-distribution Image without Learning from Out-jf-distribution Data. URL: https://openaccess.thecvf.com/content_CVPR_2020/papers/Hsu_Generalized_ODIN_Detecting_Out-of-Distribution_Image_Without_Learning_From_Out-of-Distribution_Data_CVPR_2020_paper.pdf (Last accessed: 14. 11. 2021).

Breunig M. и др. LOF // Proceedings of the 2000 ACM SIGMOD international conference on Management of data - SIGMOD '00. 2000. URL: https://www.researchgate.net/publication/221214719_LOF_Identifying_Density-Based_Local_Outliers (Last accessed: 15. 11. 2021).

Na S., Xumin L., Yong G. Research on k-means Clustering Algorithm: An Improved k-means Clustering Algorithm // 2010 Third International Symposium on Intelligent Information Technology and Security Informatics. 2010.

Markus Goldstein, Andreas Dengel. Histogram-based Outlier Score (HBOS): A fast Unsupervised Anomaly Detection Algorithm. URL: https://www.researchgate.net/publication/231614824_Histogram-based_Outlier_Score_HBOS_A_fast_Unsupervised_Anomaly_Detection_Algorithm (Last accessed: 17. 11. 2021).

Warp-core. URL: https://workday.github.io/warp-core/contents/anomaly_detection/ (Last accessed: 17. 11. 2021).

Support Vector Machines: Theory and Applications. URL:https://www.researchgate.net/publication/221621494_Support_Vector_Machines_Theory_and_Applications (Last accessed: 17. 11. 2021).

Fei Tony Liu, Kai Ming TingGippsland School of Information TechnologyMonash University, Victoria, Australia. Isolation Forest. URL: https://www.researchgate.net/publication/224384174_Isolation_Forest (Last accessed: 18. 11. 2021).

Xuehui Wang, Yong Zhang, Hao Liu, Yang Wang, Lichun Wang , and Baocai Yin. An Improved Robust Principal Component Analysis Model for Anomalies Detection of Subway Passenger Flow.

JR L., GG K. The measurement of observer agreement for categorical data. URL: https://pubmed.ncbi.nlm.nih.gov/843571/ (Last accessed: 18. 11. 2021).

Study Finance. URL: https://studyfinance.com/static/media/z-score.png (Last accessed: 21. 11. 2021).

A Brief Overview of Outlier Detection Techniques .URL: https://towardsdatascience.com/a-brief-overview-of-outlier-detection-techniques-1e0b2c19e561 (Last accessed: 21. 11. 2021).

Demo of DBSCAN clustering algorithm. URL: https://scikit-learn.org/stable/_images/sphx_glr_plot_dbscan_001.png (Last accessed: 22. 11. 2021).

Outlier detection with Local Outlier Factor (LOF). URL: https://scikit-learn.org/stable/auto_examples/neighbors/plot_lof_outlier_detection.html#:~:text=The%20Local%20Outlier%20Factor%20(LOF,lower%20density%20than%20their%20neighbors. (Last accessed: 22. 11. 2021).

Anomaly Detection using Autoencoders.URL: https://towardsdatascience.com/anomaly-detection-using-autoencoders-5b032178a1ea (date of application: 22. 11. 2021).

Cloudera Fast Forward. Deep Learning for Anomaly Detection. URL: https://ff12.fastforwardlabs.com/ (Last accessed: 24. 11. 2021)

Published
2022-04-11
How to Cite
Lykhach, O., Ugryumov, M., Shevchenko, D., & Shmatkov, S. (2022). Anomaly detection methods in sample datasets when managing processes in systems by the state. Bulletin of V.N. Karazin Kharkiv National University, Series «Mathematical Modeling. Information Technology. Automated Control Systems», 53, 21-40. https://doi.org/10.26565/2304-6201-2022-53-03
Section
Статті