Anomaly detection methods in sample datasets when managing processes in systems by the state
Abstract
The current information software does not allow solving the problems of detecting outliers in data samples and time series with a sufficiently high level of reliability.
Therefore, this work is devoted to the choice of metrics for assessing the correctness of detecting outliers, as well as the best mathematical models and methods for solving the problem of detecting outliers in test samples when managing processes in systems by state. Mathematical models and methods for detecting outliers (anomalous values) and Python-based software tools such as scikit-learn, Tensorflow, NumPy, Pandas and others have been used.
The results of our work are the overview of the metrics used to assess the effectiveness of mathematical models and methods for detecting outliers; the overview of traditional and deep learning techniques of detecting outliers; the results of researching the efficiency and quality of mathematical models and methods for detecting outliers using 12 datasets; the conclusions about the best metric and the best mathematical models and methods for solving the problem of detecting outliers in test samples when managing processes in systems by state.
The selected methods are mainly used for monitoring the level of anomalous values in various datasets when managing processes in systems by state, which makes these methods universal.
Downloads
References
/References
V.P. Shkodyrev, K.I. Yafagorov, В.А. Bashtovenko, Y.E. Ilyina. Review of methods for detecting anomalies in data streams.URL: http://ceur-ws.org/Vol-1864/paper_33.pdf (Last accessed: 10. 11. 2021). [in Russian]
M.V. Lomonosov. Detection of anomalies in the work of mechanisms by machine learning methods.URL: http://ceur-ws.org/Vol-2022/paper59.pdf (Last accessed: 10. 11. 2021). [in Russian]
Chalapathy R., Chawla S. Deep Learning for Anomaly Detection: A Survey. URL: https://arxiv.org/abs/1901.03407 (Last accessed: 10. 11. 2021)
Srikanth Thudumu, Philip Branch, Jiong Jin & Jugdutt (Jack) Singh. A comprehensive survey of anomaly detection techniques for high dimensional big data.
Deep Learning for Anomaly Detection: A Review: ACM Computing Surveys: Vol 54, No 2. URL: https://dl.acm.org/doi/10.1145/3439950 (Last accessed: 10. 11. 2021).
Muruti G., Rahim F., bin Ibrahim Z. A Survey on Anomalies Detection Techniques and Measurement Methods // 2018 IEEE Conference on Application, Information and Network Security (AINS). 2018.
Shikha Agrawal, JitendraAgrawal. Survey on Anomaly Detection using Data Mining Techniques.
Pang G. Deep Learning for Anomaly Detection // ACM Computing Surveys. 2021. Т. 54. № 2. С. 1-38.
Nassif A. Machine Learning for Anomaly Detection: A Systematic Review // IEEE Access. 2021. Т. 9. С. 78658-78700.
Izhak Golan, Ran El-Yaniv. Deep Anomaly Detection Using Geometric Transformations. URL: https://proceedings.neurips.cc/paper/2018/file/5e62d03aec0d17facfc5355dd90d441c-Paper.pdf (Last accessed: 10. 11. 2021).
Mohammad Braei, Sebastian Wagner. Anomaly Detection in Univariate Time-series: A Survey on the State-of-the-Art. URL: https://www.semanticscholar.org/paper/Anomaly-Detection-in-Univariate-Time-series%3A-A-on-Braei-Wagner/cf45bce52cca1f6e450ddaa1d19fe6e30661dffb (Last accessed: 10. 11. 2021).
Atiq ur Rehman & Samir Brahim Belhaouari. Unsupervised outlier detection in multidimensional data
Victoria J. Hodge and Jim Austin. A Survey of Outlier Detection Methodologies. URL: https://core.ac.uk/download/pdf/58585.pdf (Last accessed: 12. 11. 2021).
Karanjit Singh and Dr. Shuchita Upadhyaya. Outlier Detection: Applications And Techniques. URL:https://www.researchgate.net/publication/267964435_Outlier_Detection_Applications_And_Techniques (Last accessed: 12. 11. 2021).
Wang S. и др. Effective End-to-end Unsupervised Outlier Detection via Inlier Priority of Discriminative Network. URL: https://proceedings.neurips.cc/paper/2019/hash/6c4bb406b3e7cd5447f7a76fd7008806-Abstract.html (Last accessed: 12. 11. 2021).
Karanjit Singh and Dr. Shuchita Upadhyaya. Outlier Detection: Applications And Techniques. URL: https://www.researchgate.net/publication/228686398_k-Nearest_neighbour_classifiers (Last accessed: 14. 11. 2021).
Yen-Chang Hsu, Yilin Shen, Hongxia Jin, Zsolt Kira. Generalized ODIN: Detectiong Out-of-distribution Image without Learning from Out-jf-distribution Data. URL: https://openaccess.thecvf.com/content_CVPR_2020/papers/Hsu_Generalized_ODIN_Detecting_Out-of-Distribution_Image_Without_Learning_From_Out-of-Distribution_Data_CVPR_2020_paper.pdf (Last accessed: 14. 11. 2021).
Breunig M. и др. LOF // Proceedings of the 2000 ACM SIGMOD international conference on Management of data - SIGMOD '00. 2000. URL: https://www.researchgate.net/publication/221214719_LOF_Identifying_Density-Based_Local_Outliers (Last accessed: 15. 11. 2021).
Na S., Xumin L., Yong G. Research on k-means Clustering Algorithm: An Improved k-means Clustering Algorithm // 2010 Third International Symposium on Intelligent Information Technology and Security Informatics. 2010.
Markus Goldstein, Andreas Dengel. Histogram-based Outlier Score (HBOS): A fast Unsupervised Anomaly Detection Algorithm. URL: https://www.researchgate.net/publication/231614824_Histogram-based_Outlier_Score_HBOS_A_fast_Unsupervised_Anomaly_Detection_Algorithm (Last accessed: 17. 11. 2021).
Warp-core. URL: https://workday.github.io/warp-core/contents/anomaly_detection/ (Last accessed: 17. 11. 2021).
Support Vector Machines: Theory and Applications. URL:https://www.researchgate.net/publication/221621494_Support_Vector_Machines_Theory_and_Applications (Last accessed: 17. 11. 2021).
Fei Tony Liu, Kai Ming TingGippsland School of Information TechnologyMonash University, Victoria, Australia. Isolation Forest. URL: https://www.researchgate.net/publication/224384174_Isolation_Forest (Last accessed: 18. 11. 2021).
Xuehui Wang, Yong Zhang, Hao Liu, Yang Wang, Lichun Wang , and Baocai Yin. An Improved Robust Principal Component Analysis Model for Anomalies Detection of Subway Passenger Flow.
JR L., GG K. The measurement of observer agreement for categorical data. URL: https://pubmed.ncbi.nlm.nih.gov/843571/ (Last accessed: 18. 11. 2021).
Study Finance. URL: https://studyfinance.com/static/media/z-score.png (Last accessed: 21. 11. 2021).
A Brief Overview of Outlier Detection Techniques .URL: https://towardsdatascience.com/a-brief-overview-of-outlier-detection-techniques-1e0b2c19e561 (Last accessed: 21. 11. 2021).
Demo of DBSCAN clustering algorithm. URL: https://scikit-learn.org/stable/_images/sphx_glr_plot_dbscan_001.png (Last accessed: 22. 11. 2021).
Outlier detection with Local Outlier Factor (LOF). URL: https://scikit-learn.org/stable/auto_examples/neighbors/plot_lof_outlier_detection.html#:~:text=The%20Local%20Outlier%20Factor%20(LOF,lower%20density%20than%20their%20neighbors. (Last accessed: 22. 11. 2021).
Anomaly Detection using Autoencoders.URL: https://towardsdatascience.com/anomaly-detection-using-autoencoders-5b032178a1ea (date of application: 22. 11. 2021).
Cloudera Fast Forward. Deep Learning for Anomaly Detection. URL: https://ff12.fastforwardlabs.com/ (Last accessed: 24. 11. 2021)
В.П. Шкодирєв, К.І. Ягафаров, В.А. Баштовенко, Є.Е. Ільїна. Огляд методів виявлення аномалій в потоках даних. URL: http://ceur-ws.org/Vol-1864/paper_33.pdf (дата звернення: 10. 11. 2021).
М.В. Ломоносова. Виявлення аномалій у роботі механізмів методами машинного навчання. URL: http://ceur-ws.org/Vol-2022/paper59.pdf (дата звернення: 10. 11. 2021).
Chalapathy R., Chawla S. Deep Learning for Anomaly Detection: A Survey. URL: https://arxiv.org/abs/1901.03407 (дата звернення: 10. 11. 2021)
Srikanth Thudumu, Philip Branch, Jiong Jin & Jugdutt (Jack) Singh. A comprehensive survey of anomaly detection techniques for high dimensional big data.
Deep Learning for Anomaly Detection: A Review: ACM Computing Surveys: Vol 54, No 2. URL: https://dl.acm.org/doi/10.1145/3439950 (дата звернення: 10. 11. 2021).
Muruti G., Rahim F., bin Ibrahim Z. A Survey on Anomalies Detection Techniques and Measurement Methods // 2018 IEEE Conference on Application, Information and Network Security (AINS). 2018.
Shikha Agrawal, JitendraAgrawal. Survey on Anomaly Detection using Data Mining Techniques.
Pang G. и др. Deep Learning for Anomaly Detection // ACM Computing Surveys. 2021. Т. 54. № 2. С. 1-38.
Nassif A. и др. Machine Learning for Anomaly Detection: A Systematic Review // IEEE Access. 2021. Т. 9. С. 78658-78700.
Izhak Golan, Ran El-Yaniv. Deep Anomaly Detection Using Geometric Transformations. URL: https://proceedings.neurips.cc/paper/2018/file/5e62d03aec0d17facfc5355dd90d441c-Paper.pdf (дата звернення: 10. 11. 2021).
Mohammad Braei, Sebastian Wagner. Anomaly Detection in Univariate Time-series: A Survey on the State-of-the-Art. URL: https://www.semanticscholar.org/paper/Anomaly-Detection-in-Univariate-Time-series%3A-A-on-Braei-Wagner/cf45bce52cca1f6e450ddaa1d19fe6e30661dffb (дата звернення: 10. 11. 2021).
Atiq ur Rehman & Samir Brahim Belhaouari. Unsupervised outlier detection in multidimensional data
Victoria J. Hodge and Jim Austin. A Survey of Outlier Detection Methodologies. URL: https://core.ac.uk/download/pdf/58585.pdf (дата звернення: 12. 11. 2021).
Karanjit Singh and Dr. Shuchita Upadhyaya. Outlier Detection: Applications And Techniques. URL:https://www.researchgate.net/publication/267964435_Outlier_Detection_Applications_And_Techniques (дата звернення: 12. 11. 2021).
Wang S. и др. Effective End-to-end Unsupervised Outlier Detection via Inlier Priority of Discriminative Network. URL: https://proceedings.neurips.cc/paper/2019/hash/6c4bb406b3e7cd5447f7a76fd7008806-Abstract.html (дата заернення: 12. 11. 2021).
Karanjit Singh and Dr. Shuchita Upadhyaya. Outlier Detection: Applications And Techniques. URL: https://www.researchgate.net/publication/228686398_k-Nearest_neighbour_classifiers (дата звернення: 14. 11. 2021).
Yen-Chang Hsu, Yilin Shen, Hongxia Jin, Zsolt Kira. Generalized ODIN: Detectiong Out-of-distribution Image without Learning from Out-jf-distribution Data. URL: https://openaccess.thecvf.com/content_CVPR_2020/papers/Hsu_Generalized_ODIN_Detecting_Out-of-Distribution_Image_Without_Learning_From_Out-of-Distribution_Data_CVPR_2020_paper.pdf (дата звернення: 14. 11. 2021).
Breunig M. и др. LOF // Proceedings of the 2000 ACM SIGMOD international conference on Management of data - SIGMOD '00. 2000. URL: https://www.researchgate.net/publication/221214719_LOF_Identifying_Density-Based_Local_Outliers (дата звернення: 15. 11. 2021).
Na S., Xumin L., Yong G. Research on k-means Clustering Algorithm: An Improved k-means Clustering Algorithm // 2010 Third International Symposium on Intelligent Information Technology and Security Informatics. 2010.
Markus Goldstein, Andreas Dengel. Histogram-based Outlier Score (HBOS): A fast Unsupervised Anomaly Detection Algorithm. URL: https://www.researchgate.net/publication/231614824_Histogram-based_Outlier_Score_HBOS_A_fast_Unsupervised_Anomaly_Detection_Algorithm (дата звернення: 15. 11. 2021).
Warp-core. URL: https://workday.github.io/warp-core/contents/anomaly_detection/ (дата звернення: 17. 11. 2021).
Support Vector Machines: Theory and Applications. URL:https://www.researchgate.net/publication/221621494_Support_Vector_Machines_Theory_and_Applications (дата звернення: 17. 11. 2021).
Fei Tony Liu, Kai Ming TingGippsland School of Information TechnologyMonash University, Victoria, Australia. Isolation Forest. URL: https://www.researchgate.net/publication/224384174_Isolation_Forest (дата звернення: 18. 11. 2021).
Xuehui Wang, Yong Zhang, Hao Liu, Yang Wang, Lichun Wang , and Baocai Yin. An Improved Robust Principal Component Analysis Model for Anomalies Detection of Subway Passenger Flow.
JR L., GG K. The measurement of observer agreement for categorical data. URL: https://pubmed.ncbi.nlm.nih.gov/843571/ (дата звернення: 18. 11. 2021).
Study Finance. URL: https://studyfinance.com/static/media/z-score.png (дата звернення: 21. 11. 2021).
A Brief Overview of Outlier Detection Techniques .URL: https://towardsdatascience.com/a-brief-overview-of-outlier-detection-techniques-1e0b2c19e561 (дата звернення: 21. 11. 2021).
Demo of DBSCAN clustering algorithm. URL: https://scikit-learn.org/stable/_images/sphx_glr_plot_dbscan_001.png (дата звернення: 22. 11. 2021).
Outlier detection with Local Outlier Factor (LOF). URL: https://scikit-learn.org/stable/auto_examples/neighbors/plot_lof_outlier_detection.html#:~:text=The%20Local%20Outlier%20Factor%20(LOF,lower%20density%20than%20their%20neighbors. (дата звернення: 22. 11. 2021).
Anomaly Detection using Autoencoders.URL: https://towardsdatascience.com/anomaly-detection-using-autoencoders-5b032178a1ea (дата звернення: 22. 11. 2021).
Cloudera Fast Forward. Deep Learning for Anomaly Detection. URL: https://ff12.fastforwardlabs.com/ (дата звернення: 24. 11. 2021)