Entropy of DNA sequences and leukemia patients mortality

Keywords: entropy, DNK sequence, patients surviving, leukemia


Introduction. Deoxyribonucleic acid (DNA) is not a random sequence of four nucleotides  combinations: comprehensive reviews [1, 2] persuasively shows long- and short-range correlations in DNA, periodic properties and correlations structure of sequences. Information theory methods, like Entropy, imply quantifying the amount of information contained in sequences. the relationship between entropy and patient survival is widespread in some branches of medicine and medical researches: cardiology, neurology, surgery,  trauma. Therefore, it appears there is a necessity for implementing advantages of information theory methods for exploration of relationship between mortality of some category of patients and entropy of their DNA sequences. Aim of the research. The goal of this paper is to provide a reliable formula for calculating entropy accurately for short DNA sequences and to show how to use existing entropy analysis to examine the mortality of leukemia patients. Materials and Methods. We used University of Barcelona (UB) leukemia patient’s data base (DB) with 117 anonymized records that consists: Date of patient’s diagnosis, Date of patient’s death, Leukemia diagnoses, Patient’s DNA sequence. Average time for patient death after diagnoses: 99 ± 77 months. The formal characteristics of DNA sequences in UB leukemia patient’s DB are: average number of bases N = 496 ± 69; min (N) = 297 bases; max(N) = 745 bases. The generalized form of the Robust Entropy Estimator (EnRE) for short DNA sequences was proposed and key EnRE futures was showed. The Survival Analysis has been done using statistical package IBM SPSS 27 by Kaplan-Meier survival analysis and Cox Regressions survival modelling. Results. The accuracy of the proposed EnRE for calculating entropy was proved for various lengths of time series and various types of random distributions. It was shown, that in all cases for = 500, relative error in calculating the precise value of entropy does not exceed 1 %, while the magnitude of correlation is no worse than 0.995. In order to yield the minimum EnRE standard deviation and coefficient of variation, an initial DNA sequence's alphabet code was converted into an integer code of bases using an optimization rule for only one minimal numerical decoding around zero. Entropy EnRE were calculated for leukemia patients for two samples: 2 groups divided by median EnRE = 1.47 and 2 groups of patients were formed according to their belonging to 1st (EnRE ≤ 1.448) and 4th (EnRE ≥ 1.490) quartiles. The result of Kaplan-Meier survival analysis and Cox Regressions survival modelling are statistically significant: p < 0,05 for median groups and p < 0,005 for patient’s groups formed of 1st and 4th quartiles. The death hazard for a patient with EnRE below median is 1.556 times that of a patient with EnRE over median and that the death hazard for a patient of 1st entropy quartile (lowest EnRE) is 2.143 times that of a patient of 4th entropy quartile (highest EnRE). Conclusions. The transition from widen (median) to smaller (quartile) patients’ groups with more EnRE differentiation confirmed the unique significance of the entropy of DNA sequences for leukemia patient’s mortality. This significance is proved statistically by increasing hazard and decreasing of average time of death after diagnoses for leukemia patients with lower entropy of DNA sequences.


Download data is not yet available.

Author Biographies

Oleksandr Martynenko , V. N. Karazin Kharkiv National University

D.Sc., Ph.D., Full Professor, Department of Hygiene and Social Medicine, School of Medicine, V. N. Karazin Kharkiv National University, 6, Svobody sq., Kharkiv, Ukraine, 61022

Pastor Xavier Duran, Department of Surgery and Medical-Surgical, University of Barcelona

Doctor of Medicine and Surgery, University Professor, Department of Surgery and Medical-Surgical, University of Barcelona, Chief of Medical Informatics Unit, Hospital Clinic, 170, Villarroel st. Barcelona, Spain, 08036

Frid Santiago Andres, Medical Informatics Unit, Hospital Clínic de Barcelona

MD, M.Sc., Medical Associated Professor, Department of Clinical Foundations, School of Medicine, Universitat de Barcelona, 143, Casanova st., Barcelona, Spain, 08036. Chief of Area of Projects and Developments, Medical Informatics Unit, Hospital Clínic de Barcelona, 170, Villarroel st., Barcelona, Spain, 08036

Gil Rojas Jessyca, Hospital Clínic de Barcelona

MSc, Data Manager, Medical Informatics Unit, Hospital Clínic de Barcelona, 170, Villarroel st., Barcelona, Spain, 08036

Liudmila Maliarova , V. N. Karazin Kharkiv National University

Assistant, Department of hygiene and social medicine, School of Medicine, V. N. Karazin Kharkiv National University, 6, Svobody sq., Kharkiv, Ukraine, 61022


Li WT. The study of correlation structures of DNA sequences: a critical review. Comput. Chem. 1997; 21 (4): 257–271. DOI: https://doi.org/10.1016/s0097-8485(97)00022-3

Damasevicius R. Complexity estimation of genetic sequences using information-theoretic and frequency analysis methods. Informatica. 2010; 21 (1): 13–30. DOI: https://doi.org/10.15388/Informatica.2010.270

Rowe GW, Trainor LEH. On the informational content of viral DNA. J. Theoretical Biology. 1983; 101: 151–170. DOI: https://doi.org/10.1016/0022-5193(83)90332-6

Vopson MM, Robson SC. A new method to study genome mutations using the information entropy. Physica A. 2012;1-9. DOI: https://doi.org/10.1016/j.physa.2021.126383

Sherwin WB. Entropy and Information Approaches to Genetic Diversity and its Expression: Genomic Geography. Entropy. 2010;12:1765-1798. DOI: https://doi.org/10.3390/e12071765

Chanda P, Costa E, Hu J, Sukumar S, Van Hemert J, Walia R. Information Theory in Computational Biology: Where We Stand Today. Entropy. 2020;22:627-637. DOI: https://doi.org/10.3390/e22060627

Villareal RP, Liu BC, Massumi A. Heart rate variability and cardiovascular mortality. Curr Atheroscler Rep. 2002; 4: 120–127. DOI: https://doi.org/10.1007/s11883-002-0035-18

Rodríguez J, Correa C, Ramírez L. Heart dynamics diagnosis based on entropy proportions: Application to 550 dynamics. Revista Mexicana de Cardiología. 2017; 28 (1): 10–20.

Androulakis AFA, Zeppenfeld K, Paiman EHM, Piers SRD, Wijnmaalen AP, Siebelink HJ, Sramko M, Lamb HJ, van der Geest RJ, de Riva M, Tao Q. Entropy as a Novel Measure of Myocardial Tissue Heterogeneity for Prediction of Ventricular Arrhythmias and Mortality in Post-Infarct Patients. JACC Clin Electrophysiol. 2019 Apr;5 (4): 480–489. DOI: https://doi.org/10.1016/j.jacep.2018.12.005. Epub 2019 Feb 27. PMID: 31000102.

Sykora M, Szabo J, Siarnik P, Turcani P, Krebs S, Lang W, Czosnyka M, Smielewski P. Heart rate entropy is associated with mortality after intracereberal hemorrhage. Journal of the Neurological Sciences. 2020: 418: 117033, ISSN 0022-510X, 1–5: DOI: https://doi.org/10.1016/j.jns.2020.117033

Matsuda E. Entropy Monitoring in Patients Undergoing General Anesthesia. Am J Nurs. 2017 Mar;117(3):62. DOI: https://doi.org/10.1097/01.NAJ.0000513290.22001.8d

Neal-Sturgess C. The Entropy of Morbidity Trauma and Mortality. Arxiv Cornell University. Med. Physics. 2010; 1–20. DOI: https://doi.org/10.48550/arxiv.1008.3695

Norris PR, Anderson SM, Jenkins JM, Williams AE, Morris JAJr. Heart rate multiscale entropy at three hours predicts hospital mortality in 3,154 trauma patients. Shock. 2008 Jul; 30 (1): 17–22. DOI: https://doi.org/10.1097/SHK.0b013e318164e4d0

Papaioannou VE, Chouvarda IG, Maglaveras NK, Baltopoulos GI, Pneumatikos IA. Temperature multiscale entropy analysis: a promising marker for early prediction of mortality in septic patients. Physiol Meas. 2013 Nov;34(11):1449-66. DOI: https://doi.org/10.1088/0967-3334/34/11/1449

Weir BS. Statistical analysis of molecular genetic data. IMA J. of Math. Applied in Medicine and Biology. 1985; 2:1–39.

Shannon CE. A Mathematical Theory of Communication. Bell System Technical Journal. 1948; 27 (3): 379–423. DOI: https://doi.org/10.1002/j.1538-7305.1948.tb01338.x

Lazo A, Rathie P. On the entropy of continuous probability distributions. IEEE Transactions on Information Theory. 1978;24(1). DOI: https://doi.org/10.1109/TIT.1978.1055832

Gini C, Ottaviani G. Università di Roma. Memorie Di Metodologia Statistica. Roma: E.V. Veschi; 1955.

Sánchez-Hechavarría M.E. and etc. Introduction of Application of Gini Coefficient to Heart Rate Variability Spectrum for Mental Stress Evaluation. Arq Bras Cardiol. 2019; [online].ahead print, PP.0-0. DOI: https://doi.org/10.5935/abc.20190185

Firebaugh G. Empirics of World Income Inequality. American Journal of Sociology. 1999; 104 (6): 597–1630. DOI: https://doi.org/10.1086/210218

Shorrocks AF. The Class of Additively Decomposable Inequality Measures. Econometrica. 1980; 48 (3): 613–625. DOI: https://doi.org/10.2307/1913126

Martynenko A, Raimondi G, Budreiko N. Robust Entropy Estimator for Heart Rate Variability. Klin. Inform. Telemed. 2019; 14 (15): 67–73. DOI: https://doi.org/10.31071/kit2019.15.06

How to Cite
Martynenko , O., Duran, P. X., Andres, F. S., Jessyca, G. R., & Maliarova , L. (2022). Entropy of DNA sequences and leukemia patients mortality. The Journal of V. N. Karazin Kharkiv National University, Series "Medicine&quot;, (45). https://doi.org/10.26565/2313-6693-2022-45-02