Machine Learning–Derived Prenatal Predictive Risk Model to Guide Intervention and Prevent the Progression of Gestational Diabetes Mellitus to Type 2 Diabetes: Prediction Model Development Study

Background The increasing prevalence of gestational diabetes mellitus (GDM) is concerning as women with GDM are at high risk of type 2 diabetes (T2D) later in life. The magnitude of this risk highlights the importance of early intervention to prevent the progression of GDM to T2D. Rates of postpartum screening are suboptimal, often as low as 13% in Asian countries. The lack of preventive care through structured postpartum screening in several health care systems and low public awareness are key barriers to postpartum diabetes screening. Objective In this study, we developed a machine learning model for early prediction of postpartum T2D following routine antenatal GDM screening. The early prediction of postpartum T2D during prenatal care would enable the implementation of effective strategies for diabetes prevention interventions. To our best knowledge, this is the first study that uses machine learning for postpartum T2D risk assessment in antenatal populations of Asian origin. Methods Prospective multiethnic data (Chinese, Malay, and Indian ethnicities) from 561 pregnancies in Singapore’s most deeply phenotyped mother-offspring cohort study—Growing Up in Singapore Towards healthy Outcomes—were used for predictive modeling. The feature variables included were demographics, medical or obstetric history, physical measures, lifestyle information, and GDM diagnosis. Shapley values were combined with CatBoost tree ensembles to perform feature selection. Our game theoretical approach for predictive analytics enables population subtyping and pattern discovery for data-driven precision care. The predictive models were trained using 4 machine learning algorithms: logistic regression, support vector machine, CatBoost gradient boosting, and artificial neural network. We used 5-fold stratified cross-validation to preserve the same proportion of T2D cases in each fold. Grid search pipelines were built to evaluate the best performing hyperparameters. Results A high performance prediction model for postpartum T2D comprising of 2 midgestation features—midpregnancy BMI after gestational weight gain and diagnosis of GDM—was developed (BMI_GDM CatBoost model: AUC=0.86, 95% CI 0.72-0.99). Prepregnancy BMI alone was inadequate in predicting postpartum T2D risk (ppBMI CatBoost model: AUC=0.62, 95% CI 0.39-0.86). A 2-hour postprandial glucose test (BMI_2hour CatBoost model: AUC=0.86, 95% CI 0.76-0.96) showed a stronger postpartum T2D risk prediction effect compared to fasting glucose test (BMI_Fasting CatBoost model: AUC=0.76, 95% CI 0.61-0.91). The BMI_GDM model was also robust when using a modified 2-point International Association of the Diabetes and Pregnancy Study Groups (IADPSG) 2018 criteria for GDM diagnosis (BMI_GDM2 CatBoost model: AUC=0.84, 95% CI 0.72-0.97). Total gestational weight gain was inversely associated with postpartum T2D outcome, independent of prepregnancy BMI and diagnosis of GDM (P=.02; OR 0.88, 95% CI 0.79-0.98). Conclusions Midgestation weight gain effects, combined with the metabolic derangements underlying GDM during pregnancy, signal future T2D risk in Singaporean women. Further studies will be required to examine the influence of metabolic adaptations in pregnancy on postpartum maternal metabolic health outcomes. The state-of-the-art machine learning model can be leveraged as a rapid risk stratification tool during prenatal care. Trial Registration ClinicalTrials.gov NCT01174875; https://clinicaltrials.gov/ct2/show/NCT01174875


Introduction
The prevalence of gestational diabetes mellitus (GDM) is increasing globally, with 1 in 6 pregnancies being affected [1]. GDM has long-term implications as women with a history of GDM have a 10-fold higher risk of developing type 2 diabetes (T2D) compared to those with a normoglycemic pregnancy [2]. In the Growing Up in Singapore Towards healthy Outcomes (GUSTO) study, women with GDM had a 12-fold higher risk of developing T2D 4-6 years after delivery compared with women who did not have GDM [3]. From a public health perspective, early intervention in women with GDM could contribute to tackling the rising global health burden of T2D. The T2D epidemic is of particular concern in Southeast Asia; 88 million adults are currently living with diabetes, but this is expected to increase to 153 million by 2045 [1]. Moreover, 57% of the population with diabetes in Southeast Asia are undiagnosed, increasing the risk of complications such as heart disease and stroke [1].
The American Diabetes Association guidelines recommend that women with GDM are tested 4-12 weeks postpartum using a 75 g oral glucose tolerance test (OGTT) [4]. Further testing is recommended in those with normal postpartum OGTT every 1-3 years using fasting plasma glucose, hemoglobin A 1c or HbA 1c , or an OGTT [4]. However, as GDM resolves post pregnancy, postpartum surveillance of glycemia remains low across health care systems globally. The rate of postpartum diabetes screening can be as low as 13% in Asian countries [5]. Barriers to postpartum diabetes screening include lack of structured postpartum preventive care in health care systems, lack of patient awareness of future T2D risk, and time restrictions due to maternal commitments [5,6].
Machine learning models enable predictive population risk stratification. In a prospective metabolomics study by Allalou et al [7], 21 metabolites were identified at 6-9 weeks post partum to predict the transition from GDM to T2D in women. The metabolite model using decision trees performed well with an area under the receiver operating characteristic curve (AUC) of 0.77. In another GDM to T2D transition study by Joglekar et al [8], the inclusion of circulating microRNA (miR-369-3p) at 12 weeks post partum enhanced the prediction of a clinical model (age, BMI, pregnancy fasting glucose, postpartum fasting glucose, cholesterol, and triacylglycerols) from an AUC of 0.83 to an AUC of 0.92 (logistic regression algorithm). In addition to low compliance of postpartum testing in women with GDM, the other barriers to the real-world implementation of these 2 machine learning models include the cost and access to metabolomics assay and microRNA polymerase chain reaction during routine clinical visits.
The early prediction of postpartum T2D during prenatal care would enable the implementation of effective strategies for diabetes prevention interventions. To date, there have been no studies on using machine learning for postpartum T2D risk assessment in antenatal populations of Asian origin. In this study from Singapore, we developed a machine learning model for early prediction of postpartum T2D during routine antenatal GDM screening. Our machine learning model was implemented using the prospective GUSTO cohort study data (NCT01174875).

Ethics Approval
This study has been reviewed by the National Healthcare Group Domain Specific Review Board for ethics approval and SingHealth Centralized Institutional Review Board (CIRB/E/2019/2655).

Study Design
GUSTO is a prospective multiethnic (Chinese, Malay, and Indian ethnicities) mother-offspring cohort study. Mothers were recruited during early pregnancy from Singapore's 2 major public maternity hospitals, National University Hospital and KK Women's and Children's Hospital, between June 2009 and October 2010.
Participants of mixed ethnicity or with self-reported T2D at recruitment were excluded from model training. A total of 561 mothers had complete data on demographics, medical or obstetric history, physical measures, lifestyle information, antenatal OGTT, and postpartum OGTT 4-8 years after delivery. The World Health Organization (WHO) 1999 criteria [9] were used to diagnose GDM, and the WHO 2006 criteria [10] were used to diagnose postpartum impaired glucose tolerance (IGT), impaired fasting glucose (IFG), and T2D. The abnormal glucose metabolism (AGM) outcome comprises of IGT, IFG, and T2D diagnoses.

Feature Variables
Information on demographics (maternal age, maternal ethnicity) and medical or obstetric history (self-reported prepregnancy weight, family history of diabetes mellitus, family history of high blood pressure, family history of cardiovascular disease, previous history of GDM, previous history of gestational hypertension, and parity) were derived from first trimester questionnaires. Systolic and diastolic blood pressure were recorded at midgestation (median 26.7, IQR 26.1-27.6 weeks) and obtained from hospital case notes. Mean arterial blood pressure was derived by doubling the diastolic blood pressure and adding to the systolic blood pressure, with the composite sum divided by 3. Maternal anthropometry was measured at midgestation (median 26.9, IQR 26.4-27.6 weeks). Maternal midupper arm circumference was measured to the nearest 0.1 cm, midway between acromion process and olecranon process (using Seca 212). Maternal height was measured to the nearest 0.1 cm (using Seca 213). Maternal weight at midpregnancy was measured to the nearest 0.1 kg (using Seca 803), and BMI was derived using weight divided by height squared (kg/m 2 ). Total gestational weight gain was derived by subtracting first antenatal visit weight (median 9.0, IQR 7.3-11.0 weeks) from the last antenatal visit weight (median 38.1, IQR 37.3-39.1 weeks).
Lifestyle information on self-reported smoking, environmental tobacco smoke exposures, and alcohol consumption were collected using questionnaires. GDM diagnosis was based on antenatal OGTT assessment (median 26.9, IQR 26.4-27.7 weeks).

Machine Learning Methodology and Statistical Analyses
Our methodological novelty lies in combining coalitional game theory concepts with machine learning. SHapley Additive exPlanations (SHAP) framework was combined with CatBoost tree ensembles for feature selection and model explainability [11,12]. The SHAP framework connects optimal credit allocation with local explanations using the classic Shapley values from cooperative game theory. Lundberg and Lee [11] have proposed SHAP as the only additive feature attribution method that satisfies 2 important properties of game theory-additivity (local accuracy) and monotonicity (consistency). In game theory, Shapley value is the average expected marginal contribution of 1 player across all possible permutation of players (ie, the average effects of team member composition and team size). Shapley value helps determine a payoff for all the game players when each player might have contributed more or less than the others when working in coalition. In machine learning, the game players are the features, and the collective payout is the model prediction. SHAP framework provides local explanations based on exact Shapley values to understand the global model structure. For each possible feature ordering, features are introduced one at a time into a conditional expectation function of the model's output, and changes in expectation are attributed to the introduced feature, averaged over all possible feature orderings in a fair manner. SHAP values represent a change in log odds ratio. Our game theoretical approach for predictive analytics enables population subtyping and pattern discovery for data-driven precision care.
The supervised machine learning models were built using Anaconda distribution of Python programming language (version 3.7.9) in JupyterLab computational environment. The predictive models were trained using the following 4 machine learning algorithms to address algorithm bias: logistic regression (generalized linear model), support vector machine (linear support vector classification), CatBoost gradient boosting (tree-based), and artificial neural network (multilayer perceptron). We used 5-fold stratified cross-validation to preserve the same proportion of AGM/T2D cases in each fold. Maximum absolute scaler was used as a preprocessor to scale each feature without destroying the sparsity. A grid search pipeline was built to evaluate the best performing hyperparameters for each machine learning model. Model performances were evaluated using the AUC with 95% CI. Implementation details of the machine learning algorithms are included in Multimedia Appendix 1.
The feature selection model using clinical features at midgestation was trained on the AGM outcome, and top predictors with SHAP value magnitudes more than zero were included in the AGM/T2D prediction models. Sensitivity analyses were performed to explore the prediction effects of diagnosing GDM using modified 2-point International Association of the Diabetes and Pregnancy Study Groups (IADPSG) 2018 criteria [9] rather than WHO 1999 criteria (GUSTO study did not include a 1-hour glucose measurement), and the prediction effects of continuous fasting or 2-hour glucose measures and prepregnancy BMI. We also assessed the associations between total gestational weight gain and postpartum AGM and T2D outcomes. All association analyses were performed using Stata/MP software (version 16.1; StataCorp LP).

The Features Significantly Associated With T2D Aligned With the Top Features From the SHAP Feature Selection Model
The relationship between all feature variables and postpartum AGM and T2D outcomes is presented in a Pearson correlation heatmap (Figures 1 and 2). Diagnosis of GDM, midupper arm circumference, and BMI are the best features for postpartum AGM/T2D machine learning model building. Table 1 presents the univariate associations between midpregnancy features and postpartum AGM and T2D outcomes. Previous history of GDM, mean arterial blood pressure, midupper arm circumference, BMI, and diagnosis of GDM were associated with later risk of T2D. The top 4 features impacting the SHAP model outputs were midupper arm circumference, mean arterial blood pressure, BMI and diagnosis of GDM (Figure 3). The negative SHAP value for height implies that maternal height did not contribute to the prediction of AGM.

Maternal Adiposity During Pregnancy and Metabolic Derangements Underlying GDM Signaling Future T2D Risk
Although the detailed training parameters and results for all machine learning models are shown in Tables S1-S6 (Multimedia Appendix 2), we focus on describing the results of CatBoost machine learning models as this algorithm had the best overall performance. The results for each data set of the 5-fold stratified cross-validation and the average of the cross-validation are also provided in Tables S1-S6  Although there is a high correlation between midupper arm circumference and BMI (r=0.91), BMI is more reliably and commonly assessed in clinical settings, and therefore, a BMI-based pregnancy model is our proposed solution ( Figure  4). Table 2 summarizes the detailed training parameters of logistic regression, support vector machine, artificial neural network, and CatBoost gradient boosting algorithms, as well as the results of the proposed postpartum T2D predictive model (comprising of midpregnancy BMI after gestational weight gain and diagnosis of GDM features). Total gestational weight gain was inversely associated with postpartum AGM and T2D outcomes, independent of prepregnancy BMI and diagnosis of GDM (Table 3).

Principal Results
We have built an effective postpartum T2D predictive model by combining game theory-based feature selection with machine learning. SHAP values recovered predictive modeling features for optimal performance, aligning model interpretability with human intuition. Our BMI_GDM model achieved an excellent AUC of 0.86 with 2 midgestation features (BMI at midgestation and diagnosis of GDM by the WHO 1999 criteria) for an early prediction of postpartum T2D risk in a Singapore population. The model was also robust when using a modified 2-point IADPSG 2018 criteria for GDM diagnosis (AUC=0.84). The BMI_GDM machine learning model can be leveraged as a risk stratification tool during routine GDM screening to identify Asian women at high risk of developing T2D, enabling early intervention. The BMI_2hour model (AUC=0.86) can be an alternative design during clinical implementation if GDM diagnosis feature is unavailable for the patient. The trained classifier can be deployed using a web application that can allow clinicians to identify women at T2D risk and develop a postpartum management plan.
The 2-feature midpregnancy BMI model (AUC=0.86) performed better in postpartum T2D prediction than a preconception BMI model (AUC=0.62), implying that midgestational weight gain effects combined with the metabolic derangements underlying GDM and fetoplacental unit signal future T2D risk. As pregnancy has a diabetogenic effect on metabolism [13], further studies will be required to examine the metabolic adaptations in pregnancy and postpartum maternal metabolic health outcomes.
In our BMI_GDM model sensitivity analysis, we observed that the 2-hour antenatal OGTT glucose peak was associated with a stronger prediction of postpartum T2D (AUC=0.86) compared with the fasting glucose (AUC=0.76) in Singaporean women. Future studies with greater statistical power will be needed to confirm whether the postpartum T2D risk is heterogenous across different thresholds of glucose tolerance for GDM diagnostic criteria.

Limitations
This study has some limitations due to the scarcity of longitudinal data. Postpartum OGTT at 4-12 weeks, and further testing in those with normal postpartum OGTT every 1-3 years were not administered in the GUSTO study, possibly underestimating the development of postdelivery dysglycemia to a certain extent and inducing bias. However, the mothers participating in GUSTO self-reported T2D status 2 years after delivery, and there were no self-reported T2D cases. Our prediction models were trained on a limited cohort of 561 pregnancies and require further validation using larger cohorts such as Electronic Health Record databases. A subcohort analyses by individual ethnic groups can be trained with larger data sets.

Comparison With Prior Work
Our early implementation of T2D risk prediction algorithm during prenatal care enables early engagement of patients and remote monitoring, compared to existing molecular biomarker-based T2D risk prediction algorithms [7,8] developed for postpartum care. The 2 midgestation clinical features (midpregnancy BMI after gestational weight gain and diagnosis of GDM) discovered from our machine learning workflow are of low cost and easily accessible during routine antenatal GDM screening. The digital biomarkers identified from our work will guide antenatal research in preventing the progression of GDM to T2D.

Conclusions
The key strength of our study lies in applying machine learning-based predictive analytics during prenatal care in the early prediction of postpartum T2D. This machine learning model can be leveraged as a risk stratification tool for preventive intervention.