Application of Exploratory Data Analysis for Investigating Factors Influencing Sleep Quality
Abstract
Relevance. The research of the multifactorial nature of sleep quality requires the analysis of large datasets, which is impossible without the use of exploratory data analysis (EDA) methods to identify hidden patterns. In this regard, the development of approaches for the intelligent analysis of factors influencing sleep is a relevant scientific and technical task. Goal. To examine and identify the relationships between physiological, behavioral, and environmental factors and sleep quality using exploratory data analysis methods. Research methods. The research was based on exploratory data analysis (EDA) methods, primarily aimed at examining the presence of correlations between sleep quality and variables such as sleep duration, stress level, and physical activity. The subsequent construction of a heatmap was necessary to identify latent relationships and to extract the most relevant features. In addition, a linear regression model, a decision tree model, and a logistic regression model were employed to investigate the factors influencing human sleep quality. The results. The results obtained using the developed software application with a graphical user interface for analyzing factors influencing human sleep quality are presented. The software application enables data loading, exploratory data analysis, model construction, and result visualization in a user-friendly format. It supports the application of both classification and regression algorithms, allowing it to be adapted to a wide range of analytical tasks. An analysis of the obtained results was conducted, and models with the highest accuracy, adaptability to complex relationships, and interpretability were identified. Conclusions. The obtained results confirm the versatility of decision tree methods for the analysis of sleep-related factors. Their accuracy and algorithmic transparency make this approach optimal for modeling complex interrelationships within the scope of the study. Overall, the analysis of factors influencing sleep using EDA methods enables the transformation of complex data into meaningful analytical models, which represents a relevant task for digital medicine.
Downloads
References
Hobson J. A. Sleep is of the brain, by the brain and for the brain. Nature. 2005;437(7063):1254–1256. https://doi.org/10.1038/nature04283.
Irish L. A., Kline C. E., Gunn H. E., Hall M. H., Buysse D. J. The role of sleep hygiene in promoting public health: a review of empirical evidence. Sleep Medicine Reviews. 2015;22:23–36. https://doi.org/10.1016/j.smrv.2014.10.001.
Deng Z., Xie L., Wang Y., Li Y., Huang X., Sun L. Application of logistic regression in diagnosis of OSA severity. Sleep and Breathing. 2020;24(4):1379–1387. https://doi.org/10.1007/s11325-020-02029-x.
Korost Ya. V., Shkvarok A. K. Assessment of sleep quality of the population of Ukraine during martial law and the risk of cardiovascular complaints associated with clinically expressed insomnia. Clinical and Preventive Medicine. 2023;(7):68–73. https://doi.org/10.31612/2616-4868.7.2023.09
Ukrinform. Half of Ukrainians have sleep problems. URL: https://www.ukrinform.ua/rubric-society/3958411-polovina-ukrainciv-maut-problemi-zi-snom.html (accessed 12.12.2025).).
Carskadon M. A., Dement W. C. Normal human sleep: an overview. In: Kryger M. H., Roth T., Dement W. C., editors. Principles and Practice of Sleep Medicine. 6th ed. Philadelphia: Elsevier; 2017. p. 15–24.
Freeman D., Sheaves B., Waite F., Harvey A. G. Sleep disturbance and psychiatric disorders: a review of meta-analyses. The Lancet Psychiatry. 2017;4(8):684–698. https://doi.org/10.1016/S2215-0366(17)30150-9.
Khalid M., Klerman E. B., McHill A. W., Phillips A. J. K., Sano A. SleepNet: attention-enhanced robust sleep prediction using dynamic social networks. arXiv preprint. 2024;arXiv:2401.11113. Available from: https://arxiv.org/abs/2401.11113 (accessed 12 Dec 2025).
Tukey J. W. Exploratory Data Analysis. Reading, MA: Addison-Wesley; 1977. 688 p.
Kelleher J. D., Tierney B. Data Science: An Introduction. Cambridge, MA: MIT Press; 2018. 280 p.
Behrens J. T., Yu C. H. Exploratory data analysis. In: Schinka J. A., Velicer W. F., editors. Handbook of Psychology. Vol. 2. Thousand Oaks, CA: SAGE; 2003. p. 33–48.
Provost F., Fawcett T. Data Science for Business. Sebastopol, CA: O’Reilly Media; 2013. 414 p.
Dasgupta A. Practical Data Analysis. Birmingham: Packt Publishing; 2014. 342 p.
Lundgren O., Moneta G. B. Associations of subjective sleep quality with depression, anxiety, and physical symptoms. Scandinavian Journal of Psychology. 2011;52(6):544–550. https://doi.org/10.1111/j.1467-9450.2011.00910.x
Lemma S., Gelaye B., Berhane Y., Worku A., Williams M. A. Sleep quality and its psychological correlates among university students in Ethiopia: a cross-sectional study. BMC Psychiatry. 2012;12:237. https://doi.org/10.1186/1471-244X-12-237
Trujillano J., Gil-Sánchez D., Párraga-Martínez I., Flores-Mateo G. Methodological review of classification trees for risk stratification in health research // Nutrients. – 2025. – Vol. 17, № 11. – Article 1903. – doi: 10.3390/nu17111903.
Rezvani S., Pourpanah F., Lim C. P., Wu Q. M. J. Methods for class-imbalanced learning with support vector machines: a review and an empirical evaluation. Soft Computing. 2024;28:11873–11894. https://doi.org/10.1007/s00500-024-09931-5
Fu W. Exploratory data analysis and machine learning models for stroke prediction. In: Proceedings of the 1st International Conference on Data Analysis and Machine Learning (DAML 2023); 2024. p. 211–217. https://doi.org/10.5220/0012783300003885.
Permana K. E., Iramina K. Enhancing sleep stage classification with single-channel EEG: feature extraction and Random Forest–XGBoost model. IEEE Access. 2025;13:149554–149566. https://doi.org/10.1109/ACCESS.2025.3599828.
Gao Q., Wu K. Automatic sleep staging based on power spectral density and random forest. Journal of Biomedical Engineering. 2023;40(2):280–285, 294. https://doi.org/10.7507/1001-5515.202207047.
Wang Y., Ye S., Xu Z., Chu Y., Zhang J., Yu W. Research on sleep staging based on support vector machine and extreme gradient boosting algorithm. Nature and Science of Sleep. 2024;16:1827–1847. https://doi.org/10.2147/NSS.S467111.
Xu X., Zhang B., Xu T., Tang J. An effective and interpretable sleep stage classification approach using multi-domain EEG and EOG features. Bioengineering. 2025;12(3):286. https://doi.org/10.3390/bioengineering12030286.
Sleep Health and Lifestyle Dataset. Kaggle. Available from: https://www.kaggle.com/datasets/uom190346a/sleep-health-and-lifestyle-dataset (accessed 12 Dec 2025).