Clustering and Classification of Time Series Sound Data

doi:10.26565/2519-2310-2024-1-04

Stanislav Kachanov PhD Student, V. N. Karazin Kharkiv National University, Ukraine https://orcid.org/0009-0002-6938-6717
Dmytro Vlasenko senior lecturer of the Department of Theoretical and Applied Computer Sciences, PhD in mathematics, V. N. Karazin Kharkiv National University, Ukraine https://orcid.org/0009-0006-8780-2066

DOI: https://doi.org/10.26565/2519-2310-2024-1-04

Keywords: time series classification, time series clustering, recurrent neural network, LSTM

Abstract

This scientific article addresses two critical tasks in data analysis—time series classification and clustering, particularly focusing on heart sound recordings. One of the main challenges in analyzing time series lies in the difficulty of comparing different series due to their variability in length, shape, and amplitude. Various algorithms were employed to tackle these tasks, including the Long Short-Term Memory (LSTM), KNN, recurrent neural network for classification and the K-means and DBSCAN methods for clustering. The study emphasizes the effectiveness of these methods in solving classification and clustering problems involving time series data containing heart sound recordings. The results indicate that LSTM is a powerful tool for time series classification due to its ability to retain contextual information over time. In contrast, KNN demonstrated high accuracy and speed in classification, though its limitations became apparent with larger datasets. For clustering tasks, the K-means method proved to be more effective than DBSCAN, showing higher clustering quality based on metrics such as silhouette score, Rand score, and others. The data used in this research were obtained from the UCR Time Series Archive, which includes heart sound recordings from various categories: normal sounds, murmurs, additional heart sounds, artifacts, and extra systolic rhythms. The analysis of results demonstrated that the chosen classification and clustering methods could be effectively used for diagnosing heart diseases. Furthermore, this research opens up new opportunities for further improvement in data processing and analysis methods, particularly in developing new medical diagnostic tools. Thus, this work illustrates the effectiveness of machine learning algorithms for time series analysis and their significance in improving cardiovascular disease diagnosis.

Downloads

Download data is not yet available.

References

Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735

Greff, K., Srivastava, R. K., Koutník, J., Steunebrink, B. R., & Schmidhuber, J. (2017). LSTM: A search space odyssey. IEEE transactions on neural networks and learning systems, 28(10), 2222-2232. https://doi.org/10.1109/TNNLS.2016.2582924

Zhang, Z. (2004). Nearest neighbor search algorithms and applications. Springer. https://doi.org/10.1007/978-3-319-14717-8_39

Dasarathy, B. V. (1991). Nearest neighbor (NN) norms: NN pattern classification techniques. IEEE Computer Society Press.

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media. https://doi.org/10.1007/978-0-387-84858-7

Xu, R., & Wunsch, D. (2005). Survey of clustering algorithms. IEEE Transactions on Neural Networks, 16(3), 645–678. https://doi.org/10.1109/TNN.2005.845141

Martin Ester, Jörg Sander (1996). "Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications". Data Mining and Knowledge Discovery. 2 (2): 169–194. https://doi.org/10.1007/BF00457189

Hoang A.D., Bagnall A., Kaveh K., Chin-Chia M.Y., Zhu Y., Shaghayegh G., Chotirat A.R., Eamonn K. The UCR Time Series Archive https://arxiv.org/abs/1810.07758

Schubert, E., Sander, J., Ester, M., Kriegel, H.-P., & Xu, X. (2017). "DBSCAN revisited, revisited: why and how you should (still) use DBSCAN". ACM Transactions on Database Systems (TODS), 42(3), 19. https://doi.org/10.1145/3068335

Kachanov Stanislav (2024) Clustering and Classification of Time Series Data (master diploma) V. N. Karazin Kharkiv National University