Agent-oriented method of clustering the wholesale distributor data
Abstract
The purpose of the research is to improve the accuracy of data clustering and to determine the target number of data clusters generated by dynamic economic systems, using an agent-oriented clustering method with the introduction of data preprocessing methods.
Research methods: data processing and preparation methods, elemental distance measures, and clustering methods have been used. The software is developed by using the Python language. The following libraries have also been used: scikit-learn, NumPy, SciPy, Pandas, PyTorch and others.
As a result of the research, the data of the wholesale distributor have been processed by the data pre-processing methods such as the determination of missing values, the determination of asymmetry and the Box-Cox transformation. The normalization of the data with the min-max normalization method and the dimensionality reduction with the PCA and t-SNE methods have been carried out. Afterwards, the agent-oriented clustering method has been applied with the Manhattan distance, Mahalanobis distance with the inverse value of the membership function, Kullback-Leibler divergence and cross-entropy metrics. Kullback-Leibler divergence has shown the best accuracy results and has been chosen for the further testing. The ability of the agent-oriented method to determine the number of clusters has been tested. The use of data preprocessing methods shows the clear presence of 3 target clusters, which was confirmed by the method. Conclusions: The developed method allows for high clustering accuracy due to the performed data processing, the correctly selected measure of elemental distance and the use of an agent-oriented approach. This method can be used to improve the quality of data clustering of dynamic economic systems, but the method requires improvement in order to increase flexibility in determining the size of cluster agents.
Downloads
References
/References
J. Wełeszczuk, B. Kosińska-Selbi, P. Cholewińska. Prediction of Polish Holstein's economical index and calving interval using machine learning. Livestock Science. October 2022. Volume 2. DOI: https://doi.org/10.1016/j.livsci.2022.105039 (дата звернення 25.06.2023).
Soroush Mahjoubi, Rojyar Barhemat, Pengwei Guo, Weina Meng, Yi Bao. Prediction and multi-objective optimization of mechanical, economical, and environmental properties for strain-hardening cementitious composites (SHCC) based on automated machine learning and metaheuristic algorithms. Journal of Cleaner Production. 20 December 2021. Volume 329. DOI: https://doi.org/10.1016/j.jclepro.2021.129665 (дата звернення 25.06.2023).
Yasemin Gültepe. Analysis of Alburnus tarichi population by machine learning classification methods for sustainable fisheries. SLAS Technology. 2022. Volume 27. Issue 4. Pages 261-266. DOI: https://doi.org/10.1016/j.slast.2022.03.005 (дата звернення 25.06.2023).
Benjamin Decardi-Nelson, Jinfeng Liu. Robust Economic Model Predictive Control with Zone Control. IFAC-PapersOnLine. 2021. Volume 54. Issue 3. Pages 237-242. DOI: https://doi.org/10.1016/j.ifacol.2021.08.248 (дата звернення 25.06.2023).
Muhammad Mohsin, Fouad Jamaani. Green finance and the socio-politico-economic factors’ impact on the future oil prices: Evidence from machine learning. Resources Policy. 2023. Volume 85. Part A. DOI: https://doi.org/10.1016/j.resourpol.2023.103780 (дата звернення 25.06.2023).
Strilets V., Donets V., Ugryumov M., Zelenskyi R., Goncharova T. Agent-Oriented data clustering for medical monitoring. Radioelectronic and Computer Systems, 2022, № 1, P. 103–114. DOI: https://doi.org/10.32620/reks.2022.1.08 (дата звернення 25.06.2023).
Johannes Schneider, Michail Vlachos. Fast parameterless density-based clustering via random projections. CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management. October 2013. Pages 861–866. DOI: https://doi.org/10.1145/2505515.2505590 (дата звернення 25.06.2023).
Erez Hartuv, Ron Shamir. A clustering algorithm based on graph connectivity, Information Processing Letters. 2000 Volume 76. Issues 4–6. Pages 175-181. DOI: https://doi.org/10.1016/S0020-0190(00)00142-3 (дата звернення 25.06.2023).
Wui Lee Chang, Lie Meng Pang, Kai Meng Tay. Application of self-organizing map to failure modes and effects analysis methodology. Neurocomputing. 2017. Volume 249. Pages 314-320. DOI: https://doi.org/10.1016/j.neucom.2016.04.073 (дата звернення 25.06.2023).
Donets V., Ugryumov M., Strilets V. A Measure Of Compactness For Fuzzy Clustering Based On Entropy. Scientific collection of works of the international scientific and technical conference "Computer modeling in science-intensive technologies (KMNT -2022)".
Jun Liu, Guobin Yang, Nan Zhou, Kaiyu Qin, Badong Chen, Yonghong Wu, Kup-Sze Choi. Event-triggered consensus control based on maximum correntropy criterion for discrete-time multi-agent systems. Neurocomputing. 2023. Volume 545. DOI: https://doi.org/10.1016/j.neucom.2023.126323 (дата звернення 25.06.2023).
Margarida Cardoso. Wholesale customers. UCI Machine Learning Repository. 2014. DOI: https://doi.org/10.24432/C5030X (дата звернення: 25.06.2023).
Sakia R.M. The box-cox transformation technique: A Review. The Statistician. 1992. Т. 41. № 2. С. 169. DOI: https://doi.org/10.2307/2348250 (дата звернення: 25.06.2023).
Maćkiewicz A., Ratajczak W. Principal Components Analysis (PCA). Computers & Geosciences. 1993. Т. 19. № 3. С. 303–342. DOI: https://doi.org/10.1016/0098-3004(93)90090-R (дата звернення: 25.06.2023).
L.J.P. van der Maaten and G.E. Hinton. Visualizing High-Dimensional Data Using t-SNE. Journal of Machine Learning Research. 9 Nov 2008. URL: https://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf (дата звернення: 25.06.2023).
Lykhach O., Ugryumov M., Shevchenko D., Shmatkov S. Methods of detecting emissions in test samples during process control in state-based systems. Bulletin of Kharkiv National University named after V.N. Karazin, series "Mathematical modeling. Information Technology. Automated control systems". 2022. (53). C. 21-40. [In Ukrainian]
Shevchenko D., Ugryumov M., Artiukh S. Monitoring data aggregation of dynamic systems using information technologies. Innovative Technologies and Scientific Solutions for Industries. 2023. No. 1 (23), P. 123–131. DOI: https://doi.org/10.30837/ITSSI.2023.23.123 (дата звернення: 25.06.2023).
J. Wełeszczuk, B. Kosińska-Selbi, P. Cholewińska. Prediction of Polish Holstein's economical index and calving interval using machine learning. Livestock Science. October 2022. Volume 2. DOI: https://doi.org/10.1016/j.livsci.2022.105039 (дата звернення 25.06.2023).
Soroush Mahjoubi, Rojyar Barhemat, Pengwei Guo, Weina Meng, Yi Bao. Prediction and multi-objective optimization of mechanical, economical, and environmental properties for strain-hardening cementitious composites (SHCC) based on automated machine learning and metaheuristic algorithms. Journal of Cleaner Production. 20 December 2021. Volume 329. DOI: https://doi.org/10.1016/j.jclepro.2021.129665 (дата звернення 25.06.2023).
Yasemin Gültepe. Analysis of Alburnus tarichi population by machine learning classification methods for sustainable fisheries. SLAS Technology. 2022. Volume 27. Issue 4. Pages 261-266. DOI: https://doi.org/10.1016/j.slast.2022.03.005 (дата звернення 25.06.2023).
Benjamin Decardi-Nelson, Jinfeng Liu. Robust Economic Model Predictive Control with Zone Control. IFAC-PapersOnLine. 2021. Volume 54. Issue 3. Pages 237-242. DOI: https://doi.org/10.1016/j.ifacol.2021.08.248 (дата звернення 25.06.2023).
Muhammad Mohsin, Fouad Jamaani. Green finance and the socio-politico-economic factors’ impact on the future oil prices: Evidence from machine learning. Resources Policy. 2023. Volume 85. Part A. DOI: https://doi.org/10.1016/j.resourpol.2023.103780 (дата звернення 25.06.2023).
Strilets V., Donets V., Ugryumov M., Zelenskyi R., Goncharova T. Agent-Oriented data clustering for medical monitoring. Radioelectronic and Computer Systems, 2022, № 1, P. 103–114. DOI: https://doi.org/10.32620/reks.2022.1.08 (дата звернення 25.06.2023).
Johannes Schneider, Michail Vlachos. Fast parameterless density-based clustering via random projections. CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management. October 2013. Pages 861–866. DOI: https://doi.org/10.1145/2505515.2505590 (дата звернення 25.06.2023).
Erez Hartuv, Ron Shamir. A clustering algorithm based on graph connectivity, Information Processing Letters. 2000 Volume 76. Issues 4–6. Pages 175-181. DOI: https://doi.org/10.1016/S0020-0190(00)00142-3 (дата звернення 25.06.2023).
Wui Lee Chang, Lie Meng Pang, Kai Meng Tay. Application of self-organizing map to failure modes and effects analysis methodology. Neurocomputing. 2017. Volume 249. Pages 314-320. DOI: https://doi.org/10.1016/j.neucom.2016.04.073 (дата звернення 25.06.2023).
Donets V., Ugryumov M., Strilets V. A Measure Of Compactness For Fuzzy Clustering Based On Entropy. Науковий збірник праці міжнародної науково-технічної конференції «Комп'ютерне моделювання у наукоємних технологіях (КМНТ -2022)».
Jun Liu, Guobin Yang, Nan Zhou, Kaiyu Qin, Badong Chen, Yonghong Wu, Kup-Sze Choi. Event-triggered consensus control based on maximum correntropy criterion for discrete-time multi-agent systems. Neurocomputing. 2023. Volume 545. DOI: https://doi.org/10.1016/j.neucom.2023.126323 (дата звернення 25.06.2023).
Margarida Cardoso. Wholesale customers. UCI Machine Learning Repository. 2014. DOI: https://doi.org/10.24432/C5030X (дата звернення: 25.06.2023).
Sakia R.M. The box-cox transformation technique: A Review. The Statistician. 1992. Т. 41. № 2. С. 169. DOI: https://doi.org/10.2307/2348250 (дата звернення: 25.06.2023).
Maćkiewicz A., Ratajczak W. Principal Components Analysis (PCA). Computers & Geosciences. 1993. Т. 19. № 3. С. 303–342. DOI: https://doi.org/10.1016/0098-3004(93)90090-R (дата звернення: 25.06.2023).
L.J.P. van der Maaten and G.E. Hinton. Visualizing High-Dimensional Data Using t-SNE. Journal of Machine Learning Research. 9 Nov 2008. URL: https://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf (дата звернення: 25.06.2023).
Лихач О., Угрюмов М., Шевченко Д., Шматков С. Методи виявлення викидів в пробних вибірках при управлінні процесами в системах за станом. Вісник Харківського національного університету імені В.Н. Каразіна, серія «Математичне моделювання. Інформаційні технології. Автоматизовані системи управління». 2022. (53). C. 21-40.
Shevchenko D., Ugryumov M., Artiukh S. Monitoring data aggregation of dynamic systems using information technologies. Innovative Technologies and Scientific Solutions for Industries. 2023. No. 1 (23), P. 123–131. DOI: https://doi.org/10.30837/ITSSI.2023.23.123 (дата звернення: 25.06.2023).