The Diabits App for Smartphone-Assisted Predictive Monitoring of Glycemia in Patients With Diabetes: Retrospective Observational Study

Background Diabetes mellitus, which causes dysregulation of blood glucose in humans, is a major public health challenge. Patients with diabetes must monitor their glycemic levels to keep them in a healthy range. This task is made easier by using continuous glucose monitoring (CGM) devices and relaying their output to smartphone apps, thus providing users with real-time information on their glycemic fluctuations and possibly predicting future trends. Objective This study aims to discuss various challenges of predictive monitoring of glycemia and examines the accuracy and blood glucose control effects of Diabits, a smartphone app that helps patients with diabetes monitor and manage their blood glucose levels in real time. Methods Using data from CGM devices and user input, Diabits applies machine learning techniques to create personalized patient models and predict blood glucose fluctuations up to 60 min in advance. These predictions give patients an opportunity to take pre-emptive action to maintain their blood glucose values within the reference range. In this retrospective observational cohort study, the predictive accuracy of Diabits and the correlation between daily use of the app and blood glucose control metrics were examined based on real app users’ data. Moreover, the accuracy of predictions on the 2018 Ohio T1DM (type 1 diabetes mellitus) data set was calculated and compared against other published results. Results On the basis of more than 6.8 million data points, 30-min Diabits predictions evaluated using Parkes Error Grid were found to be 86.89% (5,963,930/6,864,130) clinically accurate (zone A) and 99.56% (6,833,625/6,864,130) clinically acceptable (zones A and B), whereas 60-min predictions were 70.56% (4,843,605/6,864,130) clinically accurate and 97.49% (6,692,165/6,864,130) clinically acceptable. By analyzing daily use statistics and CGM data for the 280 most long-standing users of Diabits, it was established that under free-living conditions, many common blood glucose control metrics improved with increased frequency of app use. For instance, the average blood glucose for the days these users did not interact with the app was 154.0 (SD 47.2) mg/dL, with 67.52% of the time spent in the healthy 70 to 180 mg/dL range. For days with 10 or more Diabits sessions, the average blood glucose decreased to 141.6 (SD 42.0) mg/dL (P<.001), whereas the time in euglycemic range increased to 74.28% (P<.001). On the Ohio T1DM data set of 6 patients with type 1 diabetes, 30-min predictions of the base Diabits model had an average root mean square error of 18.68 (SD 2.19) mg/dL, which is an improvement over the published state-of-the-art results for this data set. Conclusions Diabits accurately predicts future glycemic fluctuations, potentially making it easier for patients with diabetes to maintain their blood glucose in the reference range. Furthermore, an improvement in glucose control was observed on days with more frequent Diabits use.


Introduction
Background Diabetes mellitus is one of the biggest public health challenges of our days. Globally, the number of adults living with the disease has risen from 108 to 422 million between 1980 and 2014, constituting about 8.5% of the worldwide adult population [1]. The complications of diabetes caused by increased blood glucose levels (hyperglycemia) include both macrovascular (ischemic heart disease, cerebrovascular disease, peripheral vascular disease leading to lower extremity amputations) and microvascular (eg, diabetic retinopathy and nephropathy) diseases [2].
In healthy adults, the pancreas maintains blood glucose levels between approximately 70 mg/dL and 180 mg/dL [3] (mostly at the lower end of this range, except for short postprandial increases) by balancing the levels of insulin and glucagon in the bloodstream.
Owing to impaired pancreatic function and/or reduced insulin sensitivity, patients with diabetes face the challenge of maintaining their blood glucose levels within the reference range via exogenous insulin administration, medications, and lifestyle modifications (eg, changes in diet, exercise, sleep patterns). These patients, especially those with type 1 diabetes (whose pancreas produces no insulin at all), must constantly monitor their glycemic state and use exogenous insulin to keep their blood glucose from increasing beyond the healthy range into hyperglycemia, while avoiding out-of-range low (hypoglycemic) values, which can potentially lead to seizures, coma, and even death [4].
The task of blood glucose monitoring, traditionally performed using capillary blood sampling, has been made easier in recent years with the introduction of continuous glucose monitoring (CGM) devices [5], which measure glucose levels at a set frequency, typically every 5 min, via interstitial fluid. Currently, CGM devices are capable of providing an accurate picture of recent and current blood glucose levels and alerting the users of hypo-or hyperglycemic events. Some of the existing devices have incorporated simple autoregression algorithms to predict impending blood glucose fluctuations (usually no more than 15-20 min ahead of time) and issue a notification if a hypo-or hyperglycemic event is expected. However, we believe that the functionality of CGM devices can be significantly extended with additional tools to improve their utility and, consequently, the quality of life of their users.

Current Research on Blood Glucose Predictions
There are two common reasons for making blood glucose predictions. The first is to be able to manage blood glucose levels automatically via a closed-loop feedback system for a continuous insulin pump [6,7]. The second, which is the way in which predictions are used in Diabits, the diabetes management app whose predictive approach and accuracy are reviewed in this publication, is to give the results back to the patient so that their insulin and food intake and other behaviors can be corrected to avoid possible hypo-or hyperglycemia.
Owing to the potential benefits of anticipating blood glucose changes ahead of time, there have been many studies (eg, ) dedicated to developing models capable of short-term (usually in the range of 15-120 min into the future) glycemic predictions. These studies generally fall into 2 categories: (1) physiological approaches [8][9][10][11][12], wherein researchers try to model the metabolic processes within the patient's body using general knowledge of human physiology, and (2) data-driven models , which mostly rely on statistical and machine learning techniques applied to the existing CGM data and other available information (eg, meals, exogenous insulin, sleep, and physical activity) to derive standard patterns of blood glucose behavior, which are then used to predict future glycemic events.
The challenge of using physiological predictive models lies in the fact that to be accurate, these models require a more detailed description of the current state of the patient's body than can normally be achieved, and even in the presence of such data (eg, in a clinical setting), the performance of physiological models is limited because of the inherent complexity of the human glucose-insulin dynamics, which makes identification of model parameters a difficult task. Therefore, data-driven models (or hybrid models that combine statistical methods with physiological insights) are more viable in practice for short-term blood glucose predictions, as evidenced by most studies cited above.

Evaluation of Prediction Accuracy
The accuracy of short-term blood glucose predictions reported in different studies cannot be easily compared, partly because there exists a great variety of metrics that are used by researchers to evaluate predictive performance, such as the root mean square error (RMSE), mean absolute relative difference [42], prediction time lag and the J index [43], and different methods [44][45][46][47] based on using error grids developed for blood glucose meter evaluation, such as the Clarke Error Grid [48] and the Parkes (Consensus) Error Grid [49,50]. More importantly, even with the same metric, glycemic prediction models can exhibit noticeable variation in accuracy when applied to different sets of data owing to the nature of data (in silico or in vivo), the amount of data available for each patient, physiological differences between patients, behavioral changes for each patient, and data quality issues. This variance can be partially reduced by using larger data sets, but for many researchers, only limited data are available owing to the fact that blood glucose readings, similar to all medical data, are usually not shared freely because of patient privacy concerns. Although there have been recent attempts to facilitate blood glucose research by creating established sets of CGM data available to scientists, such as the Ohio T1DM (type 1 diabetes mellitus) data set [51], most studies published to date use private data sets for evaluation, which makes it difficult to objectively evaluate the quality of their results.
Furthermore, the prediction accuracy of different studies may be significantly affected by varied availability of non-CGM data, particularly information related to meal and insulin events. If predictions are only made for periods when no such events occur (which can only be done if the researcher has the data indicating their occurrence), or if these events are taken into account by the predictive model, the accuracy is likely to be much higher than in case of making a prediction for an interval during which unknown events affecting the patient's blood glucose may have taken place.

Feedback Delays and Implications for Predictions
It is important to point out that CGM devices do not measure the actual blood glucose levels but measure the concentration of glucose in interstitial fluid, which tends to follow blood glucose with a patient-and condition-dependent time lag, usually in the range of 5 to 20 min [52][53][54][55]. Although the postprocessing of measured CGM data may partially account for this delay, to avoid out-of-range blood glucose excursions, the predictions need to be made in advance in order for the user (or an automatically controlled insulin pump if the predicted values are used by an artificial pancreas algorithm) to be able to make a correction, while the true blood glucose concentration is still within its reference range.
There are several other sources of delays when using predictions for blood glucose control. Frequently, predictions themselves may be lagging compared with the future interstitial glucose levels because of the nature of the predictive algorithm. Next, CGM devices only perform measurements using discrete time intervals (usually between 3 and 15 min, with 5 min being the most common in practice). Therefore, the last measured point may not be quite up to date at the moment the user sees the prediction. Additional delays are introduced by the CGM filtering algorithms [53]. In addition, the corrective action by the user may not have an immediate effect on blood glucose (eg, even for rapid-acting insulin delivered subcutaneously, the action is delayed by about 5-10 min [56]).
Owing to all these delays, in order for the predictions to be maximally effective in preventing out-of-range blood glucose excursions, it is preferable to anticipate glycemic changes for at least 30 min in advance, especially in cases of hyperglycemic events caused by the delayed action of insulin. For hypoglycemia prediction, shorter time horizons may be acceptable [23], although a longer accurate prediction would still give the user more time to take preventive measures.

Goals
The aim of this paper is to describe how the challenges that exist in blood glucose predictions are addressed in the Diabits smartphone app and to evaluate the accuracy of its predictions and the potential clinical effects of the app using data from the app's users and other existing data sets.

General Description of Diabits
Diabits is a smartphone app that is available both for iOS and Android phones, which reads current blood glucose data either from the app associated with a Dexcom CGM device (via Dexcom Share) or from Nightscout, a cloud-based data aggregator project that can collect, if configured by the user, current data from a Dexcom or Medtronic CGM, and then presents these data in real time to the user, along with predictions of blood glucose behavior for the next 60 min and statistical information and charts based on the patient's past blood glucose data.
The main parts of the user interface of the app are shown in Figure 1. Graph panel (a) is the main screen of the app, displaying the recent CGM data, predicted future blood glucose values, and estimated values of insulin and carbohydrates on board, that is, available for future use by the body. The meal and insulin information, entered manually by each user of the app based on their best knowledge, is displayed in the Journal panel (b). The Analytics panel (c) shows several statistics based on the recent history of the patient's blood glucose. Some of the graphic parts of the design may have experienced minor changes throughout the study. The predictive models of Diabits were originally created on the basis of the results of a clinical study conducted in collaboration with the endocrinology unit of BC Children's Hospital (located in Vancouver, Canada) between April and October 2017 [57]. During this study, CGM data and heart rate and physical activity information of 9 young patients with type 1 diabetes were collected over a period of 2 months with the goal of creating an accurate model for short-term blood glucose predictions. The predictive models that were developed during this study were subsequently refined [58] using data from a larger pool (approximately 1200 people) of free-living users of the app with approximately 1.6 million data points.
The app gives users an option to manually record, according to their knowledge, food consumption (carbohydrate, protein, and fat content and the glycemic index), insulin intake (the number of units and the type of insulin), physical exercise (intensity and duration), and other events that may affect their blood glucose. This information is added to the CGM data as model inputs to increase the prediction accuracy. The predictive models of Diabits rely significantly on CGM inputs, as most users do not provide enough food and insulin information required to make a model that is primarily based on physiological principles. However, all available physiological inputs are taken into account when making a prediction. A schematic diagram of the Diabits prediction approach is shown in Figure 2.

Details of Machine Learning Approach Used in Diabits
Glucose predictions are made via a supervised machine learning framework, with personalized models trained using each patient's past data.
Glucose values are calculated for 4 time points: 15, 30, 45, and 60 min ahead, with a separate model trained for each point. When plotting the data for users, the in-between points are filled using cubic interpolation. Although it is possible to train models for any number of minutes divisible by the CGM time step (eg, for 5, 10, 15 min, if the CGM time step is 5 min), it is not necessary in practice because the actual blood glucose behavior of patients with insulin-dependent diabetes typically lacks a noticeable high-frequency component [59] (even though unfiltered CGM values may exhibit such fluctuations because of random measurement errors).
To create inputs for the model, in addition to CGM data, recent food and insulin records, if available, are used to estimate the amount of carbohydrates and insulin currently present in the body (this information is also displayed for the user to see) and their rates of utilization. The calculations are performed using physiological models similar to those reported in the literature, (eg, [12,60]). As these physiological models have a number of parameters that are specific to each patient, these calculations can only be performed once a sufficient number of previous points with food and insulin data have been collected so that personalized parameters can be estimated from these. Until that point (for newer app users and those who rarely provide such data to the app), a simpler estimation approach for the current amount of carbohydrates and insulin remaining is used based on the food and insulin information reported by the patient, each patient's insulin-to-carbohydrate ratio and correction factor provided to the app at sign-up, and the changes in blood glucose levels since each food and/or insulin event.
Other data points, such as those related to the time of the day, day of the week, and recent physical activity data, are also added as separate model inputs to increase the accuracy of predictions.
The resulting inputs are used for training a model that combines gradient boosted decision trees and SVM regression. Gradient boosted decision trees [61] is an ensemble machine learning technique that works by consecutively training new trees on the differences between the ground truth labels and the combined prediction of all preceding trees. SVM regression [62] operates similar to linear regression, but with a maximum margin (hinge) loss and a kernel mapping that allows to model nonlinear systems. Diabits uses standard implementations of both of these algorithms from open-source Python packages.
The exact mechanism by which these two methods are implemented and combined are not addressed in this paper but may be disclosed in future publications. Generally, the decision tree model is used to evaluate which of the several possible physiological states the patient is currently in, and then an SVM model trained exclusively on the data pertaining to this particular state (as determined by the training algorithm) generates the prediction.
For each Diabits user, the initial personalized (based solely on this user's data) model is built once 2000 CGM points (about a week of continuous data) are available. Thereafter, the model is retrained every 2 weeks to take advantage of the most recent data.

Prediction Adjustments in Diabits
One of the issues that needs to be addressed when predictive models are trained on past patient behavior is that in the absence of detailed nutritional and insulin information for free-living patients, training points may reflect unrecorded prior corrections that the patients have made by either ingesting carbohydrates or using insulin. This is particularly problematic when blood glucose is near the edges of the target range (eg, just above 70 mg/dL or just below 180 mg/dL for the standard reference range of glucose values). A model trained on such data will likely predict similar corrections happening in the future, which may result in the patient actually foregoing necessary corrections owing to the fact that blood glucose is predicted to normalize on its own.
To mitigate this effect, in situations where such errors are likely to occur (ie, in situations with an impending hypo-or hyperglycemic event that the user is likely to have avoided in the past training data by taking food or insulin), Diabits uses an additional algorithm to correct its predictions to generate the most likely trajectory of blood glucose in the absence of future external interventions. The user can then decide, based on their own judgment, if any interventions are necessary. This adjustment is only used when blood glucose is trending toward the outside of the target range, there has been no recent change in the direction of the trend indicating a possible unreported meal or insulin event, and no meal or insulin events have been reported in the last 40 min. The final prediction is generated as a weighted average of the main model's prediction and a prediction that applies linear regression to the recent CGM data and therefore is guaranteed to continue the current trend.
Note that this Diabits adjustment, which typically increases the calculated prediction error (because we are no longer trying to predict what will actually happen, but instead what will happen if no action is taken) but, in our opinion, makes the predictions more practically useful, was not used to ensure a fair comparison in part III of the results of this paper, namely when comparing the prediction accuracy of our model with published research on the Ohio T1DM data set. The results for the actual in-app predictions and glycemic control versus frequency of app use (part I and part II), however, are based on a model that does include this adjustment.

Study Format and Ethical Compliance
All parts of this research are based on retrospective observational cohort studies. The first part (Accuracy of Past In-App Predictions for Free-Living Users) and the second part (Glycemic Control vs Frequency of App Use) analyzed the past data of free-living Diabits users. The researchers, in accordance with the Diabits' privacy policy, had no access to personally identifiable information of the users, relying instead on anonymized randomly generated universally unique identifier strings [63], and had no contact with any of the participants. Thus, we believe that the participants did not fall under the definition of human subjects [64]; hence, no institutional review board review was necessary. Informed consent was received from every Diabits user upon sign-up that their anonymized data could be used for research purposes.
In the third part of the study (Accuracy of Predictions on the 2018 Ohio T1DM Data Set), a publicly available anonymized 2018 Ohio T1DM data set [51] was used. The data user agreement for this data set allows the use of its data for research purposes.

Part I: Accuracy of Past In-App Predictions for Free-Living Users
The goal of this part of the study was to examine a large set of past Diabits predictions made for the actual users of the app and to determine the clinical safety of these predictions using Clarke and Parkes Error Grid analysis. All of Diabits users with type 1 diabetes (as reported by the patients themselves during sign-up) were ranked by the number of blood glucose data points they shared with the app in 2019, and the 500 patients with the most points were chosen for analysis. The sex and age of each specific subject was not known to the researchers; however, in general, there are many Diabits users in all age categories, from newborn to those older than 70 years, and of different sexes (approximately evenly split between males and females). All of the CGM devices used by the study participants were among those compatible with the app (General Description of Diabits). The investigators did not have any further information regarding specific device models for each participant.
The distribution between the Clarke and Parkes Error Grid zones of actual 15-, 30-, 45-, and 60-min predictions made by the app in real time, as compared with the ground-truth data from future CGM points, was calculated using all of the points for these 500 patients where the prediction was made and all of the ground-truth labels were available (6,864,130 total points). The results were examined to determine whether the predictions provided could potentially have led to adverse patient outcomes.

Part II: Glycemic Control Versus Frequency of App Use
The goal of this part of the study was to determine whether there is a correlation between how often the users look at the blood glucose graph of Diabits during each day and their blood glucose control. A total of 280 Diabits users who had at least 180 days of CGM data recorded by the app in 2018 to 2019 were included. The patients came from the same pool as in the first part of the study (in fact, many are the same patients); however, their data from 2 calendar years (2018 and 2019) were used for analysis.
The blood glucose control metrics that were calculated included the average blood glucose and its SD, time in euglycemic range (TIR) [65], glucose management indicator (GMI) [66], and high BGI (HBGI) and low BGI (LBGI) blood glucose risk indices [67].
All of the metrics were analyzed as functions of the frequency of daily use, which was defined as the number of times a Diabits user looked at the graph containing CGM values and future blood glucose predictions during 1 calendar day. Diabits records each user's CGM data as long as the app is running on the smartphone even if the user is not actively looking at the results, so days with zero sessions were included.
The hypothesis of the study was that all of the blood glucose control metrics would improve with more frequent use of the app. All of the users' days were categorized into 4 different groups, namely those with 0 sessions, 1 to 5 sessions, 6 to 10 sessions, and more than 10 sessions. P values, calculated using a one-sided t test, are reported for the difference of each metric from that in the group with zero daily sessions (no active use of the app; P 0 ) and in the closest group with fewer sessions (P fewer ). A value α=.01 was used for the alpha level of significance in all cases, using the Bonferroni correction [68] for multiple comparisons.

Part III: Accuracy of Predictions on the 2018 Ohio T1DM Data Set
To facilitate the comparison of the predictive accuracy of Diabits with existing research, the base Diabits prediction framework was applied without any data set-specific adjustments to the data from the Ohio T1DM data set [51] that was used in 2018 Blood Glucose Level Prediction (BGLP) challenge at the third International Workshop on Knowledge Discovery in Healthcare Data.
Using the training portion of the data in the 2018 Ohio T1DM data set, personalized Diabits models were created for each of the 6 patients in the data set. Next, 30-min predictions were generated for all points in the test portion of the data except for the first hour, and the prediction error (RMSE) was calculated and compared against the published results of the challenge [18,[30][31][32][33]35,40,41].
The CGM data were used as is (no averaging or smoothing to eliminate random errors), and only past and present data (CGM glucose levels, basal and bolus insulin, meal, and exercise information) were used for each point to make predictions. In other words, the data were used in the same manner it is normally used in Diabits, with the training data used to train each patient's personalized prediction models and the test data to generate predictions and calculate their accuracy.

Part I: Accuracy of Past In-App Predictions for Free-Living Users
Actual 30-min Diabits predictions under free-living conditions for the 500 most active patients in 2019 (approximately 6.8 million points) made using personalized models based on the gradient boosted decision trees and the SVM regression algorithm discussed above and evaluated using Parkes Error Grid were found to be 86.89% (5,963,930/6,864,130) clinically accurate (zone A) and 99.56% (6,833,625/6,864,130) clinically acceptable (zones A and B). For the 60-min predictions, the results were 70.56% (4,843,605/6,864,130) clinically accurate and 97.49% (6,692,165/6,864,130) clinically acceptable (Table  1). A sample distribution of predicted values plotted against actual values for both Clarke and Parkes Error Grids is shown in Figure 3.

Part II: Glycemic Control Versus Frequency of App Use
To evaluate the correlation between the daily frequency of Diabits use and the quality of blood glucose control, several commonly used blood glucose control metrics were calculated for 280 users who had at least 180 days of CGM data recorded by the app in 2018 to 2019 (86,973 days combined for all users) as a function of daily number of sessions (ie, the times the user opened the app to look at the blood glucose graph) with Diabits ( Table 2).

Part III: Accuracy of Predictions on the 2018 Ohio T1DM Data Set
The calculated RMSE values for Diabits predictions on the test portion of the 2018 Ohio T1DM data set [51] are presented in Table 3.
Of note, the mean prediction error of the Diabits base model (18.68 mg/dL) is lower than that of all other published results.

Principal Findings
This paper has studied the predictive accuracy of Diabits, a smartphone app that performs blood glucose monitoring based on CGM data, presents a statistical analysis of past data, and generates short-term (up to 60 min) predictions of future glucose behavior. In addition, the correlation between daily use of Diabits and blood glucose control metrics of its users was examined.
A large number of actual predictions made by Diabits for its users were evaluated using the Clarke and Parkes Error Grid, and the resulting values were found to be in the clinically acceptable range 97.49% of the time (6,692,165/6,864,130) for 60-min predictions and 99.56% of the time (6,833,625/6,864,130) for 30-min predictions on the Parkes Grid (with similar results for the Clarke Grid), which showed that the vast majority of predictions were accurate enough to not adversely affect the patients.
By analyzing the results of actual app use, it was statistically established that more frequent daily use of Diabits was correlated with improvement in many blood glucose control metrics, including average blood glucose and its SD, TIR, GMI, and HBGI. This is consistent with the goal of the app to help patients better manage their blood glucose and pre-emptively avoid hyper-or hypoglycemia.
Finally, the accuracy of Diabits was directly compared with that of existing research using predictions on the 2018 Ohio T1DM data set, with the resulting RMSE being lower than that in the studies published by other researchers [18,[30][31][32][33]35,40,41].
All of these results show the viability of Diabits as an effective tool for blood glucose control in CGM users. They also support the quality of the model underlying Diabits to make informative blood glucose predictions based on personalized machine learning models.

Strengths, Limitations, and Possible Future Developments
In part I, the accuracy of the actual glycemic predictions of Diabits was calculated using more than 6.8 million data points. This provided a solid statistical basis for the calculations and ensured the validity of the results.
The combination of gradient boosting decision trees and SVM regression in the Diabits models may have provided an additional ensembling [70] benefit that enhanced the prediction accuracy. In addition, we believe that one of the reasons why Diabits personalized models based on these techniques work particularly well for most patients compared with, for example, neural network models, is the somewhat limited amount of training data available for each patient, which favors the traditional machine learning techniques. However, the downside is that the current personalized approach fails to take advantage of the global pool of data available through the app. One possible future research direction is to use combined data from a large number of patients to train a deep neural network model (which may achieve better accuracy with a large amount of data), and then fine-tune this model for each patient.
In part II, the discovered correlation between the daily use of Diabits and the improvement in blood glucose control metrics was based on more than 86,000 days of app use, once again giving the results statistical significance. However, the observational nature of the study and the lack of knowledge of which, if any, corrections were made by the users based on the app output does not allow us to establish causality or estimate the level of importance of each feature of Diabits, which may be a topic of future research.
In part III, the predictions of Diabits on the 2018 Ohio T1DM data set showed an improved average RMSE for 30-min predictions over other published approaches, demonstrating Diabits' high predictive accuracy when compared with other leading models on the same data set.