Analysis of clustering algorithms for product recommendations

doi:10.26565/2304-6201-2024-61-01

Nina Bakumenko V.N. Karazin Kharkiv National University, Svobody Sq 4, Kharkiv, Ukraine, 61022 https://orcid.org/0000-0003-3496-7167
Olena Tolstoluzka V.N. Karazin Kharkiv National University, Svobody Sq 4, Kharkiv, Ukraine, 61022 https://orcid.org/0000-0003-1241-7906
Yaroslav Yasinskyi V.N. Karazin Kharkiv National University, Svobody Sq 4, Kharkiv, Ukraine, 61022 https://orcid.org/0009-0008-0460-5687

DOI: https://doi.org/10.26565/2304-6201-2024-61-01

Keywords: clustering algorithm, recommender system, k-means, HDBSCAN, Mean Shift

Abstract

Relevance. In today's world, where a wide range of goods and services are available, the task of providing personalized recommendations for selecting the right one is becoming an increasingly important in many areas, including e-commerce and online platforms. Expert recommendation systems powered by search and clustering algorithms have the potential to significantly improve the user experience by offering relevant and personalized product suggestions. One of the key advantages of using clustering algorithms for recommender systems is the ability to predict the similarity of objects based on their compliance with a certain characteristic, which makes it possible to implement an effective search for products by characteristics. As a result, it allows dividing an user base into separate subgroups that can represent different market segments, preference groups, and the target audience of certain products. Identification of problems and shortcomings of such systems helps to improve algorithms, which leads to more accurate forecasts and increased sales.

Objective. The purpose of this article is to analyze the effectiveness of using cluster analysis methods in the tasks of generating recommendations.

Research methods. Comparative analysis, experiment.

Results. The effectiveness of clustering algorithms of different types (k-means++, Mean Shift and HDBSCAN) for providing product recommendations based on the assessment of the percentage of compliance with the user's request, the use of RAM, and the query execution time has been analyzed. The k-means++ algorithm showed the best performance among the tested algorithms.

Conclusions. Our analysis confirms the effectiveness of using cluster analysis methods in recommender systems. Identification of problems and shortcomings of such systems allows improving algorithms, which leads to more accurate forecasts and increased sales of companies.

Downloads

Author Biographies

Nina Bakumenko, V.N. Karazin Kharkiv National University, Svobody Sq 4, Kharkiv, Ukraine, 61022

Candidate of Technical Sciences, Associate Professor of Theoretical and applied system engineering department

Olena Tolstoluzka, V.N. Karazin Kharkiv National University, Svobody Sq 4, Kharkiv, Ukraine, 61022

Doctor of Technical Sciences, Professor of Theoretical and Applied system engineering department

Yaroslav Yasinskyi, V.N. Karazin Kharkiv National University, Svobody Sq 4, Kharkiv, Ukraine, 61022

student

References

/

References

How Search Engine Personalization Affects Rankings. [Online]. Available: https://marketbrew.ai/how-search-engine-personalization-affects-rankings Accessed on: May 21, 2024.

Data Clustering: Intro, Methods, Applications. [Online]. Available: https://encord.com/blog/data-clustering-intro-methods-applications Accessed on: May 22, 2024.

J. Das, S. Majumder, K. Mali, “Clustering Techniques to Improve Scalability and Accuracy of Recommender Systems”, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 29, no. 04, pp.. 621–651, 2021

k-means Advantages and Disadvantages: [Online]. Available: https://developers.google.com/machine-learning/clustering/ Accessed on: May 22, 2024.

Artley B. Unsupervised Learning: k-means Clustering. Towards Data Science: [Online]. Available: https://towardsdatascience.com/unsupervised-learning-k-means-clustering-27416b95af27 Accessed on: May 20, 2024.

D. Arthur,S. Vassilvitskii, "k-means++: the advantages of careful seeding", in Proc. of the Eighteenth annual ACM-SIAM symposium on Discrete algorithms, Philadelphia, PA, USA., 2007, pp. 1027–1035.

Christopher A. Hierarchical Clustering and Density-Based Spatial Clustering of Applications with Noise (DBSCAN): Online]. Available: https://medium.com/mlearning-ai/hierarchical-clustering-and-density-based-spatial-clustering-of-applications-with-noise-dbscan-b8d903095532 Accessed on: May 10, 2024.

J. Sander, "Density-Based Clustering", in Encyclopedia of Machine Learning,. C. Sammut, G. I. Webb, Eds. Boston, MA, USA:Springer, 2011, pp. 349-353.

Damir Demirović, "An Implementation of the Mean Shift Algorithm", Image Processing On Line, no. 9, pp. 251–268, 2019.

Scikit-learn User Guide: [Online]. Available: https://scikit-learn.org/stable/user_guide.html Accessed on: May 22, 2024.

Pandas documentation: [Online]. Available: https://pandas.pydata.org/ Accessed on: May 24, 2024.

Memory-profiler: [Online]. Available: https://pypi.org/project/memory-profiler/ Accessed on: May 24, 2024.

Euclidean distance score and similarity. Available: https://stats.stackexchange.com/questions/53068/euclidean-distance-score-and-similarity Accessed on: May 24, 2024.

Elbow Method for optimal value of k in k-means? Available: https://www.geeksforgeeks.org/elbow-method-for-optimal-value-of-k-in-kmeans/ Accessed on: May 24, 2024.