Clustering and Classification of Time Series Sound Data
Abstract
This scientific article addresses two critical tasks in data analysis—time series classification and clustering, particularly focusing on heart sound recordings. One of the main challenges in analyzing time series lies in the difficulty of comparing different series due to their variability in length, shape, and amplitude. Various algorithms were employed to tackle these tasks, including the Long Short-Term Memory (LSTM), KNN, recurrent neural network for classification and the K-means and DBSCAN methods for clustering. The study emphasizes the effectiveness of these methods in solving classification and clustering problems involving time series data containing heart sound recordings. The results indicate that LSTM is a powerful tool for time series classification due to its ability to retain contextual information over time. In contrast, KNN demonstrated high accuracy and speed in classification, though its limitations became apparent with larger datasets. For clustering tasks, the K-means method proved to be more effective than DBSCAN, showing higher clustering quality based on metrics such as silhouette score, Rand score, and others. The data used in this research were obtained from the UCR Time Series Archive, which includes heart sound recordings from various categories: normal sounds, murmurs, additional heart sounds, artifacts, and extra systolic rhythms. The analysis of results demonstrated that the chosen classification and clustering methods could be effectively used for diagnosing heart diseases. Furthermore, this research opens up new opportunities for further improvement in data processing and analysis methods, particularly in developing new medical diagnostic tools. Thus, this work illustrates the effectiveness of machine learning algorithms for time series analysis and their significance in improving cardiovascular disease diagnosis.
Downloads
References
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735
Greff, K., Srivastava, R. K., Koutník, J., Steunebrink, B. R., & Schmidhuber, J. (2017). LSTM: A search space odyssey. IEEE transactions on neural networks and learning systems, 28(10), 2222-2232. https://doi.org/10.1109/TNNLS.2016.2582924
Zhang, Z. (2004). Nearest neighbor search algorithms and applications. Springer. https://doi.org/10.1007/978-3-319-14717-8_39
Dasarathy, B. V. (1991). Nearest neighbor (NN) norms: NN pattern classification techniques. IEEE Computer Society Press.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media. https://doi.org/10.1007/978-0-387-84858-7
Xu, R., & Wunsch, D. (2005). Survey of clustering algorithms. IEEE Transactions on Neural Networks, 16(3), 645–678. https://doi.org/10.1109/TNN.2005.845141
Martin Ester, Jörg Sander (1996). "Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications". Data Mining and Knowledge Discovery. 2 (2): 169–194. https://doi.org/10.1007/BF00457189
Hoang A.D., Bagnall A., Kaveh K., Chin-Chia M.Y., Zhu Y., Shaghayegh G., Chotirat A.R., Eamonn K. The UCR Time Series Archive https://arxiv.org/abs/1810.07758
Schubert, E., Sander, J., Ester, M., Kriegel, H.-P., & Xu, X. (2017). "DBSCAN revisited, revisited: why and how you should (still) use DBSCAN". ACM Transactions on Database Systems (TODS), 42(3), 19. https://doi.org/10.1145/3068335
Kachanov Stanislav (2024) Clustering and Classification of Time Series Data (master diploma) V. N. Karazin Kharkiv National University
Copyright (c) 2024 Computer Science and Cybersecurity
This work is licensed under a Creative Commons Attribution 4.0 International License.