Statistical analysis of medical time series

Statistical analysis of data sets is a necessary component of any medical research. Modern methods of mathematical statistics and statistical application suites provide extensive capabilities for analysis of random values. However, when a data set is represented by a series of data ordered by time, or when structure and order of data are essential components of research, special approaches to statistical analysis become necessary.Presented in this article are special statistical methods developed by the authors for analysis of a time series: Time Series Mann-Whitney M-test is an analogue of the known nonparametric Mann-Whitney U-test for two Time Series with an equal number of elements; Nominal Time Series Measure is a statistical estimator of dynamics of a nominal series consisting of «0» (no) and «1» (yes); Time Series Entropy EnRE is a specially developed robust formula for a Time Series, intended for calculation of nonlinear stochastic measure of order or disorder, popular in various researches. Presented methods are accompanied by a detailed demonstration of capacity for statistical analysis of medical Time Series: Analysis of growth dynamics of boys and girls aged 6–7–8 years (data by World Health Organization); analysis of the number of seizures and choice of anti-epileptic drugs (data by The National Society for Epilepsy); Time series entropy EnRE for Detecting Congestive Heart Failure by standard 5-minutesHeart Rate Variability samples (data by Massachusetts Institute of Technology – Boston’s Beth Israel Hospital RR database). It has been noted that, in every case, using the named special methods for statistical analysis of medical Time Series allows one to avoid errors in interpreting results received through statistical methods and substantially increases the accuracy of statistical analysis of medical Time Series


OBJECTIVE
Statistical analysis of data sets is a necessary component of any medical research. Modern methods of mathematical statistics and statistical application suites provide extensive capabilities for analysis of random values. However, when a data set is represented by a series of data ordered by time, or when structure and order of data are essential components of research, special approaches to statistical analysis become necessary. Presented in this article are the following special statistical methods developed by the authors for analysis of a Time Series: 1. Time Series Mann-Whitney M-testan analogue of the known nonparametric Mann-Whitney U-test for two Time Series with an equal number of elements; 2. Nominal Time Series Measurea statistical estimator of dynamics of a nominal series consisting of «0» (no) and «1» (yes). Such dichotomous Time Series are often used when there is a need for describing qualitative events in development or course of a disease; 6 3. Time Series Entropy EnREspecially developed robust formula for a Time Series, intended for calculation of nonlinear stochastic measure of order or disorder, popular in various researches. In medicine, such nonlinear methods have, to the greatest degree, proved their worth for analysis and prognosis of sudden changes in medical condition, such as atrial fibrillation, epileptic seizures, etc.
All presented methods are accompanied by a detailed demonstration of capacity for statistical analysis of medical Time Series.

MATERIALS AND METHODS
Used for statistical analysis of a time series have been methods, algorithms and programs developed by the authors, these being compared with known statistical methods presented in software suites «IBM SPSS Statistics» and «Statistica» by StatSoft.
In order to illustrate presented methods for statistical analysis of time series, developed by the authors, the following cases have been used: Case 1: height-for-age data by World Health Organization (WHO) for school-aged children and adolescents [1]. The growth curves for ages 5 to 19 years were thus constructed using data from 18 months to 24 years. Additional correction of σ is performed in case of a need to take into consideration a small number of associated ranks. However, such approach is not applicable if all elements in data sets are interconnected as a single sequence of a time series.
In order to use the test for comparison of Time Series with an equal number of elements N, we have proposed a modification -Time Series Mann-Whitney M-test with a formula for М: In this case, are ranks of the elements in original time series, are ranks of time series in a general series after merging. In the proposed modification, we have taken into account the changed positions of elements of a time series before and after merging of data into a single sequence by calculating their total distances . It should be noted that Critical Values of Time Series Mann-Whitney M-test are the same for Mann-Whitney U-test.

Case 1:
Analysis of growth dynamics of boys and girls (6-7 years) and (6-8 years), according to data of the WHO [1].
In order to illustrate the use of developed Time Series MW M-test, we shall analyze the growth dynamics of boys and girls (6-7 years) and (6-8 years), according to data of the WHO [1]. The growth dynamics will be estimated by monthly increment of growth median in children. In table 1.a, it is shown that in boys and girls aged 6-7 years, growth dynamics differ at the mean level, which is confirmed, with significance level of p < 0.05, by all three conducted tests: Student's ttest, Mann-Whitney U-test and Time Series MW M-test. Should similar analysis be conducted in children aged 6-8 years, it becomes apparent that mean values ΔL have come closer, due to nonlinearity of curve to monthly increment of growth median in girls (Table 1.b). In this case, Student's t-test and Mann-Whitney U-test are not applicable, because while there is actually a greater change of growth dynamics than in children aged 6-7 years, these tests are insensitive to substantial changes in time series. Time Series MW M-test correctly takes into account veracious difference in time series at the level of p < 0.05, as well as decreasing value of correlation coefficient, from 0.986 (6-7 years) to 0.8 (6-8 years  Mn < 0.5, from the position of an observer, in the end of a series, negative events or responses are prevalent;  Mn > 0.5, from the position of an observer, in the end of a series, positive events or responses are prevalent. Case 2: Analysis of number of seizures and choice of anti-epileptic drugs (according to data ofThe National Society for Epilepsy (NSE) https://www.epilepsysociety.org.uk/) Number of epileptic seizures is a distinctive indicator for estimation of severity of an epilepsy and when choosing antiepileptic drugs (AED) therapy. With the right AED, up to 70 % of people with epilepsy could have their seizures controlled or stopped. Shown in Table 2 are cases of epileptic seizure 2 and 4 weeks before treatment and during AED treatment. Facts of registered epileptic seizure are marked as «1», and the days without epileptic seizure are marked as «0». A simple estimator of the Table 2 Seizures before and during AED therapy Before treatment AED treatment Day Day number of epileptic seizures 2 weeks before treatment and during AED treatment does not enable us to draw any conclusions as to the quality of applied AED therapy, because both before and during the therapy, equal number of epileptic seizures is observed. However, objective clinical picture has enabled a doctor to continue the existing AED therapy for another 2 weeks. The doctor's choice has proven correct, because in a 4-week timespan the number of epileptic seizures during AED treatment has decreased, when compared to the state of affairs 4 weeks before treatment. At the same time, if the doctor had an opportunity to quantitatively estimate the dynamics of change in a nominal series, the choice would be a statistically valid one:  2 weeks before treatment Mn = 0.866;  2 weeks during AED treatment Mn = 0.899;  4 weeks before treatment Mn = 0.880;  4 weeks during AED treatment Mn = 0.972. Therefore, we can see that even during the first 2 weeks since the beginning of AED treatment, measure Mn has shown positive dynamics, and in a 4-week timespan since the beginning of AED treatment, objective situation of Mn estimates has greatly improved.

Time Series Entropy
Nonlinear statistical methods, such as entropy, have found widespread use and have shown great efficiency as part of analyzing various medical data [4,5]. Various methods of implementing the entropy calculation were developed, some of which and use thereof in medicine even have special issues of magazines dedicated to them [6]. However, presented methods of entropy calculation share a common characteristic, which is their insensitivity to change in data structure, i.e. data could be either randomly shuffled or ordered, and entropy would not change. Also, known methods usually are very demanding to quantity of analyzed datarequired number of those can exceed thousands, which is sometimes hardly achievable in medical research. We have developed and proposed a robust formula for calculating entropy of time series EnRE [7]: where MD is median of time series; Dijdistance between observed data points Xi and Xj in time series; A, l, m, kestimated coefficients. Search conditions for coefficients A, l, m, k are the following [7]: 1/ accurate approximation for known distributions of a random value; 2/ independence of EnRE from N for initial time series and for series after sorting; 3/ independence of EnRE from additive changes of mean. In [7], it was established that, in case of time series represented by RR-intervals, the following coefficient values had been found: Let us note important characteristics of the found generalized form of EnRE and coefficients: 1/ EnRE [7] and found coefficients l, m, k provide independence from additive change of mean series and from magnitude of selection N for basic series and for series after sorting; 2/ value EnRE is sensitive to structural changes in series, such as, for example, sorting which increases the degree of order in series, decreasing the EnRE. This offers additional advantages in research, as shown below for a case of NSR and CHF groups classification; 3/ readjusting coefficient A alone may be required to find the best EnRE value in another range of change in parameters of various random distributions, which can always be done using the method of least squares. Case 3: Time series entropy EnRE for Detecting Congestive Heart Failure by standard 5-minute HRV samples (MIT-BIH RR database) [2].
Let us demonstrate the usage of EnRE for Detecting Congestive Heart Failure in short segments (N = 500) by MIT-BIH RR database. In [8], it has been shown that the minimal length of an RR-segment, for which it is possible to classify NSR and CHF groups by way of Multiscale Entropy Analysis, is N = 1000.
The performance of such classification is: Se = 0.70; Sp = 0.76; Acc = 0.74. Given in Table 3 are Mean and Standard deviation of EnRE for NSR and CHF for basic RR-intervals and series after sorting (N = 500). In both cases, the differences between groups are reliable to the degree of p < 10 -7 . Therefore, proposed generalized form for Robust Entropy Estimator EnRE allows, with high accuracy, to divide NSR and CHF groups in short records (N = 500), which had remained unachieved in [8] by the way of Multiscale Entropy Analysis, and presents additional advantages provided by EnRE in case of structural changes in series (such as sorting). Quality of classification achieved by using two variables EnRE and EnRE(sort) is superior to results received in [8] by way of Multiscale Entropy Analysis for RR segments with length of N = 1000, N = 2000 and N = 5000.

CONCLUSIONS
Developed by the authors and presented in this article are special methods for statistical analysis of Time Series: 1. Time Series Mann-Whitney M-testan analogue of the known nonparametric Mann-Whitney U-test for two Time Series with an equal number of elements; 2. Nominal Time Series Measurea statistical estimator of dynamics of a nominal series consisting of «0» (no) and «1» (yes); 3. Time Series Entropy EnREspecially developed robust formula for a Time Series, intended for calculation of nonlinear stochastic measure of order or disorder, popular in various researches.
All presented methods are accompanied by a detailed demonstration of capacity for statistical analysis of medical Time Series: 1. Analysis of growth dynamics of boys and girls aged 6-7-8 years (WHO); 2. Analysis of number of seizures and choice of anti-epileptic drugs (NSE); 3. Time series entropy EnRE for Detecting Congestive Heart Failure by standard 5minute HRV samples (MIT-BIH RR database).
It has been noted that, in every case, using the named special methods for statistical analysis of medical Time Series allows one to avoid errors in interpreting results received through statistical methods and substantially increases the accuracy of statistical analysis of medical Time Series.