This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Diabetes, is properly cited. The complete bibliographic information, a link to the original publication on https://diabetes.jmir.org/, as well as this copyright and license information must be included.
Predictive alerts for impending hypoglycemic events enable persons with type 1 diabetes to take preventive actions and avoid serious consequences.
This study aimed to develop a prediction model for hypoglycemic events with a low false alert rate, high sensitivity and specificity, and good generalizability to new patients and time periods.
Performance improvement by focusing on sustained hypoglycemic events, defined as glucose values less than 70 mg/dL for at least 15 minutes, was explored. Two different modeling approaches were considered: (1) a classification-based method to directly predict sustained hypoglycemic events, and (2) a regression-based prediction of glucose at multiple time points in the prediction horizon and subsequent inference of sustained hypoglycemia. To address the generalizability and robustness of the model, two different validation mechanisms were considered: (1) patient-based validation (model performance was evaluated on new patients), and (2) time-based validation (model performance was evaluated on new time periods).
This study utilized data from 110 patients over 30-90 days comprising 1.6 million continuous glucose monitoring values under normal living conditions. The model accurately predicted sustained events with >97% sensitivity and specificity for both 30- and 60-minute prediction horizons. The false alert rate was kept to <25%. The results were consistent across patient- and time-based validation strategies.
Providing alerts focused on sustained events instead of all hypoglycemic events reduces the false alert rate and improves sensitivity and specificity. It also results in models that have better generalizability to new patients and time periods.
Glucose measurements are critical for effective diabetes management. Real-time continuous glucose monitoring (CGM) devices allow for frequent, automated glucose readings from interstitial fluid in the subcutaneous tissue space. CGM has been shown to improve glycemic control and reduce glycemic excursions—decreasing both hypoglycemia and hyperglycemia [
Since the first attempt at predicting future glucose values based on CGM data in 1999 [
Predictive hypoglycemia alerts have the potential to be extremely helpful in reducing hypoglycemia risk; however, false alerts have been a major hindrance to the acceptance of predictive hypoglycemia alerts among users [
Previous studies have found that hypoglycemia prediction model performance is reduced when applied to new patients and different time periods [
Thus, despite the many advances made in terms of hypoglycemia prediction models, the shortcoming of a high FAR makes the alerts ill-suited for real-world application [
The CGM data sets were obtained from 110 pediatric patients with type 1 diabetes over 30 to 90 days. The data comprised over 1.6 million CGM values under normal living conditions. Dexcom G6 CGM devices were used to collect the CGM readings. The cohort-level profile of patients in this study can be found in
Demographic and diabetes profile of patients enrolled in the study.
Characteristic | Mean (SD) | Minimum | Maximum | ||||
|
|
|
|
||||
|
Age (years) | 12.67 (4.84) | 1 | 21 | |||
|
Glycated hemoglobin A1c (%) | 7.70 (1.63) | 5.00 | 12.50 | |||
|
Duration of diabetes (years) | 4.93 (4.09) | 0.25 | 19.18 | |||
|
|
|
|
||||
|
Number of hypoglycemic values per day per patient | 6.20 (5.98) | 0.10 | 23.73 | |||
|
Percentage of hypoglycemic values below 70 mg/dL | 2.13 (2.10) | 0.50 | 12.20 |
A glucose threshold of 70 mg/dL is used to identify the hypoglycemic range [
Frequency distribution of the duration of hypoglycemic events (n=6010).
Hypoglycemic events | Duration of consecutive CGM values falling below 70 mg/dL (minutes) | |||||||||
|
5 | 10 | 15 | 20 | 25 | 30 | 35 | 40 | 45 | >45 |
Frequency, n (%) | 572 (9.51) | 796 (13.24) | 885 (14.73) | 842 (14.01) | 676 (11.25) | 562 (9.35) | 391 (6.51) | 283 (4.71) | 209 (3.48) | 794 (13.21) |
We define the following metrics for evaluating model performance: sensitivity, specificity, and FAR.
Sensitivity measures the proportion of true positives that are correctly identified. It is also known as the true-positive rate.
where TP are the true positives and FN are the false negatives.
Specificity measures the proportion of true negatives that are correctly identified. It is also known as the true-negative rate.
where TN are the true negatives and FP are the false positives.
FAR was defined based on the definition provided by Mosquera-Lopez et al [
Random forest (RF) is a nonparametric approach that builds on an ensemble prediction of a “forest” of regression trees grown via bootstrap sampling. Model predictions are obtained from the mean of the predictions of the individual trees. RF performs well when dealing with nonlinear relationships among variables and makes no assumptions about data distributions. Owing to these characteristics, utilizing RF-based machine learning modeling resulted in good performance in our previous work [
For the multistep prediction approach, future CGM values were predicted using quantile regression forests (QRFs). The concept of quantile regression was introduced by Koenker and Hallock [
QRFs were used as a multistep forecasting method to predict the glucose values for every 5-minute interval in the prediction horizon (PH). This resulted in 6 predictions for the 30-minute PH and 12 predictions for the 60-minute PH. Based on these predictions, a sustained hypoglycemic event was detected if 3 or more consecutive predicted CGM values were <70 mg/dL.
An appropriate validation mechanism is critical to assess the performance of a machine learning model [
In this approach, the prediction model was developed on a subset of patients and validated on a different set of patients. Of the 110 patients, 70 patients (approximately 65% of the data) were randomly selected for training and the remaining 40 patients were used for performance evaluation. The final model performance reported is the mean of 5 replications of this procedure of 65%/35% split of training and validation data.
In this approach, for each of the 110 patients, the first 70% of the data was used for model training and the last 30% of the data was used for validation. The average performance using validation data on all 110 patients was reported.
A rich combination of demographic, dynamic, snowball, interaction, and contextual features were extracted from the data. An optimal set of features for hypoglycemia prediction was identified in our previous work [
In the patient-based validation approach, for both 30-minute and 60-minute PHs, the QRF method provided a significant advantage over the RF method with high sensitivity, high specificity, and low FAR. The patient-based validation approach indicated that the sustained hypoglycemic model developed using QRFs is generic and can be applied to new patients without performance degradation.
In a time-based validation setting, the RF method performed well for both 30-minute and 60-minute predictions with high sensitivity, high specificity, and low FAR, but the QRF method still outperformed it. The time-based validation methodology indicated that both models retain performance when applied to new time periods and in postdeployment.
Comparison of model performance based on sensitivity, specificity, and false alerts with patient-based and time-based validation for 30-minute and 60-minute prediction horizons (PHs).
Metrics | Patient-based validation | Time-based validation | |||||||||||||||
|
30-minute PH | 60-minute PH | 30-minute PH | 60-minute PH | |||||||||||||
|
Method 1: RFa | Method 2: QRFb | Method 1: RF | Method 2: QRF | Method 1: RF | Method 2: QRF | Method 1: RF | Method 2: QRF | |||||||||
Sensitivity, % (SD) | 39.11 (2.25) | 99.09 (0.16) | 49.27 (3.03) | 97.61 (0.41) | 96.17 | 98.94 | 95.34 | 97.91 | |||||||||
Specificity, % (SD) | 98.65 (0.09) | 98.19 (0.10) | 98.63 (0.12) | 98.09 (0.11) | 98.3 | 98.29 | 97.95 | 98.20 | |||||||||
|
|
|
|
|
|
|
|
|
|||||||||
|
Considering transient and nonhypoglycemic events as false | 6936 (356) | 9339 (459) | 7043 (317) | 9672 (431) | 6476 | 8211 | 7346 | 8465 | ||||||||
|
Considering only nonhypoglycemic events as false | 3907 (200) | 5368 (162) | 4109 (156) | 5677 (201) | 3324 | 4531 | 4334 | 4799 | ||||||||
False alert rate, % (SD) | 26.32 (2.56) | 26.50 (2.41) | 26.44 (2.37) | 26.36 (2.57) | 22.79 | 23.86 | 26.41 | 23.79 |
aRF: random forest.
bQRF: quantile regression forest.
Comparison of model performance based on sensitivity, specificity, and false alert rate with different characterizations of hypoglycemic events and different validation strategies (patient-based and time-based) for giving predictive alerts.
Model | 30-minute prediction horizon | 60-minute prediction horizon | |||||
|
Sensitivity (%) | Specificity (%) | False alert rate (%) | Sensitivity (%) | Specificity (%) | False alert rate (%) | |
All hypoglycemic events prediction (5-fold validation) | 93.61 | 93.50 | 84.94 | 91.01 | 89.82 | 77.20 | |
All hypoglycemic events prediction (new time periods) | 87.10 | 92.66 | 85.16 | 73.87 | 87.29 | 79.81 | |
All hypoglycemic events prediction (new patients) | 87.60 | 92.47 | 75.20 | 73.79 | 87.06 | 71.50 | |
Sustained hypoglycemic events prediction (QRFa—new patients) | 99.08 | 97.79 | 30.00 | 98.13 | 97.58 | 30.19 | |
Sustained hypoglycemic events prediction (QRF—new time periods) | 98.54 | 98.57 | 22.36 | 97.72 | 98.49 | 22.44 |
aQRF: quantile regression forest.
A graphical comparison between the classifiers at different threshold values using receiver operating characteristic (ROC) curves can be found in
Performance of the quantile regression forest model at different thresholds and the average time to predict a hypoglycemic event.
Metric | 30-minute prediction horizon | 60-minute prediction horizon | |||||
|
Threshold 1 | Threshold 2 | Threshold 3 | Threshold 1 | Threshold 2 | Threshold 3 | |
Sensitivity (%) | 98.54 | 99.27 | 99.51 | 97.72 | 98.29 | 98.99 | |
Specificity (%) | 98.57 | 97.56 | 96.68 | 98.49 | 97.06 | 95.53 | |
False alerts (n) | 6932 | 11,960 | 16,049 | 7215 | 14,027 | 21,297 | |
False alerts with transient events as positives (n) | 3736 | 8149 | 12,007 | 3974 | 9956 | 16,775 | |
False alert rate (%) | 22.36 | 35.36 | 43.34 | 22.44 | 37.19 | 46.96 | |
Average time to predict an event (minutes) | 18.78 | 22.95 | 26.51 | 25.24 | 35.08 | 48.35 |
We present a robust prediction model for providing high-quality alerts for sustained hypoglycemic risk in patients with type 1 diabetes. The final model (QRF model) was demonstrated to be robust to different validation approaches that best represent real-world application scenarios (new patients and new time periods). The primary research contributions of this work are (1) the development of a prediction model that focused on sustained hypoglycemic events and resulted in high sensitivity, high specificity, and a low FAR; and (2) improved generalizability of the model to new patients and new time periods. The model makes use of only CGM data in the past 4 hours and contextual information about the current hour of the day and day of the week to make predictions. A methodology contribution is the use of glucose predictions at multiple time points to facilitate inference of sustained hypoglycemia. The model was built using data collected from 110 patients over a range of 30 to 90 days under normal living conditions, ensuring validity of the results. The QRF model proposed in this work had sensitivity and specificity >97% for both 30- and 60-minute PHs. The FAR was also kept low at 22% and 29% for 30-minute and 60-minute PHs, respectively, which will lead to improved user trust in and adoption of CGM-based alerts.
A comparative analysis of different hypoglycemia prediction methodologies can be found in the literature [
In machine learning, a standard approach to validate prediction models is to split the data into a training set (to train the model) and a validation set (to evaluate model performance) [
Mosquera-Lopez et al [
Dave et al [
Having an accurate and actionable hypoglycemia prediction model with low FARs is essential to the durability of CGM in diabetes management. Furthermore, a patient-facing hypoglycemia prediction algorithm may give patients the confidence to aim for in-range glucose values without fear of hypoglycemia, potentially leading to lower glycated hemoglobin A1c (HbA1c) values and increased time in range. Of note, 22.3% of patients analyzed were using sensor-augmented pump therapy with a predictive low-glucose suspend feature (ie, Basal-IQ technology). Patients using this system are still at risk for hypoglycemia because of insulin on board, exercise, overdosing on carbohydrates, and/or hyperglycemia, so a notification for predicted hypoglycemia using advanced machine learning models with good performance could still be clinically useful.
A limitation of our approach is that transient hypoglycemic events were ignored in generating alerts. Ignoring the transient events helped the machine learning model better learn the more stable patterns of sustained events. Even though the alerts were focused on detecting sustained events, 61% of the transient events were still classified as FPs. This resulted in just 39% of the transient events (representing 13% of the total hypoglycemic events) not being detected. This trade-off was justified because transient events are not as serious as sustained hypoglycemic events. Transient events may occur because of random variations in glycemic levels (ie, noise) or temporal lag in the effect of an intervention taken by the patient (eg, consuming fast-acting carbohydrates). In either case, ignoring transient events will help in learning the stable patterns of sustained hypoglycemia. The improved FAR, sensitivity, specificity, and generalizability of the sustained hypoglycemia model presented in this paper justify this trade-off.
This study was based on patients with pediatric type 1 diabetes in the age range of 0 to 20 years using Dexcom G6 CGM devices. As such, the results are directly applicable to this population. The model may need to be recalibrated to other CGM devices such as the Guardian (Medtronic) or FreeStyle Libre (Abbott Laboratories Co.); however, the performance measures should be generalizable to other platforms provided the accuracy and frequency of incoming glucose readings remain the same. Similarly, while no specific activity profile of pediatric patients was explicitly used in the model development, the model may need to be calibrated to an adult cohort by retraining on adult CGM data [
Providing predictive alerts for hypoglycemia focused on sustained events instead of all hypoglycemic events reduces FARs and improves sensitivity and specificity. It also results in models that have better generalizability to new patients and time periods. This has important implications for sustaining CGM use and optimizing glycemic control with fewer hypoglycemic events, improved confidence, and potentially lower HbA1c. To that end, the predictive model presented in this paper will be implemented in a smartphone app in an upcoming clinical pilot study at Texas Children’s Hospital.
Patient hypoglycemia profile.
Patient pump profile.
Breakdown of sustained events by daytime and nighttime.
Breakdown of sustained and transient events.
Breakdown of sustained events.
Features extracted for prediction.
Receiver operating characteristic plot showing a comparison between different classifiers for giving out predictive alerts for 30-minute (top) and 60-minute (bottom) prediction horizons.
autoregressive integrated moving average
continuous glucose monitoring
false alert rate
Food and Drug Administration
false negatives
false positives
glycated hemoglobin A1c
prediction horizon
quantile regression forest
random forest
root-mean-square error
receiver operating characteristic
true negatives
true positives
This study was supported by FDA P50 Pediatric Device Consortia Grant #5P50FD006428 (SWPDC) (Dr Koh—Contact Principal Investigator). The authors would like to thank the contributions of Achu Byju, Department of Biomedical Engineering, Texas A&M University, for designing the overall architecture for the implementation of the data collection system.
This study involves the use of secondary analysis of deidentified data that were not collected specifically for this project and is not human subject research (Texas A&M IRB number 2019-0710).
DDave, ME, and ML conceived the idea for the study. DDave implemented the model and analyzed the data. DDave and ME wrote the manuscript. All authors provided input and helped in revising the manuscript.
DDeSalvo serves as an independent consultant for Dexcom.