Estimating the Risk of Lower Extremity Complications in Adults Newly Diagnosed With Diabetic Polyneuropathy: Retrospective Cohort Study

doi:10.2196/60141

¹Departments of Health Policy and of Epidemiology and Population Health, School of Medicine, Stanford University, Encina Commons, 615 Crothers Way, Stanford, CA, United States

²Division of Research, Kaiser Permanente Northern California, Pleasanton, CA, United States

³Department of Family Medicine, University of Colorado School of Medicine, Institute for Health Research, Kaiser Permanente Colorado, Aurora, CO, United States

⁴Department of Neurology, School of Medicine, University of Michigan, Ann Arbor, MI, United States

⁵Mid-Atlantic Permanente Research Institute, Washington DC, MD, United States

⁶The Queen's Medical Center, Honolulu, HI, United States

⁷Kaiser Permanente South San Francisco, South San Francisco, CA, United States

⁸Kaiser Permanente Oakland Medical Center, Oakland, CA, United States

⁹Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, United States

Corresponding Author:

Alyce S Adams, PhD

Background: Diabetes-related lower extremity complications, such as foot ulceration and amputation, are on the rise, currently affecting nearly 131 million people worldwide. Methods for early detection of individuals at high risk remain elusive. While data-driven diabetic polyneuropathy algorithms exist, high-performing, clinically useful tools to assess risk are needed to improve clinical care.

Objective: This study aimed to develop an electronic medical record–based machine learning algorithm that would predict lower extremity complications.

Methods: We conducted a retrospective longitudinal cohort study to predict the risk of lower extremity complications within 24 months of an initial diagnosis of diabetic polyneuropathy. From an initial cohort of 468,162 individuals with at least 1 diagnosis of diabetic polyneuropathy at one of 2 multispecialty health care systems (based in northern California and Colorado) between April 2012 and December 2016, we created an analytic cohort of 48,209 adults with continuous enrollment, who were newly diagnosed with no evidence of end-of-life care. The outcome was any lower extremity complication, including foot ulceration, osteomyelitis, gangrene, or lower extremity amputation. We randomly split the data into training (38,569/48209; 80%) and testing (9,640/48209; 20%) datasets. In the training dataset, we used super Learner (SL), an ensemble learning method that employs cross-validation and combines multiple candidate risk predictors, into a single risk predictor. We evaluated the performance of the SL risk predictor in the testing dataset using the receiver operating characteristic curve and a calibration plot.

Results: Of the 48,209 individuals in the cohort, 2327 developed a lower extremity complication during follow-up. The SL risk estimator exhibited good discrimination (AUC=0.845, 95% CI 0.826-0.863) and calibration. A modified version of our SL algorithm, simplified to facilitate real-world adoption, had only slightly reduced discrimination (AUC=0.817, 95%CI 0.797-0.837). The modified version slightly outperformed the naïve logistic regression model (AUC=0.804, 95% CI 0.783-0.825) in terms of precision gained relative to the frequency of alerts and number of patients that needed to be evaluated.

Conclusions: We have built a machine learning–based risk estimator with the potential to improve clinical detection of diabetic patients at high risk for lower extremity complications at the time of an initial diabetic polyneuropathy diagnosis. The algorithm exhibited good discriminant validity and calibration using only data from the electronic medical record. Additional research will be needed to identify optimal contexts and strategies for maximizing algorithmic fairness in both interpretation and deployment.

JMIR Diabetes 2025;10:e60141

doi:10.2196/60141

Keywords

diabetes; neuropathy; machine learning; risk prediction; lower extremity complication; diabetic; AI; artificial intelligence; diabetic polyneuropathy; California; Colorado; USA; foot ulcer; logistic regression model; regression model; lower extremity

Up to 50% of the more than 38 million Americans who have diabetes experience some peripheral nerve damage, known as diabetic polyneuropathy [1,2]. Distal symmetric polyneuropathy, the most common type of diabetic polyneuropathy, is frequently characterized as pain, tingling, and numbness that starts in the extremities. Motor and autonomic involvement, also seen in those with diabetic polyneuropathy, can lead to foot deformity. Lower extremity complications associated with diabetic polyneuropathy include foot ulceration, osteomyelitis, and gangrene, leading to amputation [1-4]. Diabetic polyneuropathy is the leading risk factor for the recent resurgence in nontraumatic lower extremity amputations in the United States [3,4], with health care costs falling between US $ 4.6 and US $ 13.7 billion dollars per year [1].

Although there is no cure for this condition, it may be possible to reduce the clinical impact of diabetic polyneuropathy through control of blood sugar levels, annual foot checks, patient education, and specialty referrals (eg, podiatry) [5]. Novel digital interventions may also hold promise for improving self-care and function among individuals at risk of lower extremity complications [6]. While likely not done often enough, identifying and referring high-risk patients for enhanced educational intervention may be cost-effective [7].

Both clinical and nonclinical factors may contribute to delayed diagnosis and undertreatment in subgroups of this patient population [8-10]. For example, differences in patient presentation or clinician interpretation of symptoms may be driven by clinical, interpersonal and societal factors. Additional tools are needed to reduce diagnostic uncertainty, as well as augment human factors to promote evidence-based care [8].

Several screening tools designed to facilitate early detection of diabetic polyneuropathy are currently used in clinical practice. These include monofilament tests, brief questionnaires (eg, the Michigan Neuropathy Screening Instrument) [11], and vibration testing, among others. Collectively, these instruments have been criticized for their lack of accuracy and vulnerability to human error and biases that lead to missed opportunities for follow-up [9,10].

Validated tools are needed to facilitate risk detection and reduce diagnostic uncertainty in the management of diabetic polyneuropathy [12-16]. However, the quality and transparency of existing risk stratification systems and algorithms is highly variable, limiting their use in everyday clinical practice [17]. The aim of this study was to evaluate the accuracy and clinical use of a machine learning (ML)–based algorithm designed to predict complications that develop within 2 years of an initial diabetic polyneuropathy diagnosis.

Creation of Analytic Cohort of Patients Newly Diagnosed With Diabetic Polyneuropathy

All data for this study were extracted from electronic medical records (EMRs) at 2 Kaiser Permanente regions, Northern California and Colorado, with facilities serving more than 5 million people. Patients with diabetes and related chronic conditions receiving care in these facilities are typically assigned to a single primary care provider with a robust panel management approach that leverages performance feedback, system-wide efficiencies, disease registries, and evidence-based practice [18].

We included clinicians engaged in diabetes quality improvement initiatives from 4 Kaiser Permanente regions and the University of Michigan Health System at each stage of the research endeavor to maximize the potential clinical use of the resulting algorithm [19].

The EMRs at both Kaiser Permanente health systems have automated patient files, which facilitate longitudinal observation of use and clinical assessments obtained across systems of care (eg, hospital, laboratory, pharmacy, and clinic). The Kaiser Permanente Northern California and Colorado embedded research units share a common data model for organizing EMR and administrative (claims) data for research use [20]. Data captured by the common data model include, but are not limited to, outpatient encounters, emergency department and inpatient claims, pharmacy orders and prescription fills, laboratory orders and results, and member enrollment and benefit coverage.

Using the EMRs, we identified 468,162 individuals who carried at least 1 diabetes-related diagnosis based on the International Classification of Diseases, Ninth and Tenth Revisions (ICD-9 and ICD-10, respectively) between April 1, 2012, and December 31, 2016. From this group, we identified a subset of 121,619 individuals with confirmed diabetic polyneuropathy (ie, at least 1 inpatient or 2 outpatient ICD-9 or ICD-10 diagnoses within 12 months of each other) who were at least 18 years of age at the time of their first diagnosis. Measures of polyneuropathy severity were not available through the electronic health record. Therefore, clinicians on the study team reviewed the list of codes and discussed any disagreements on code inclusion before the list was finalized. Detailed codes used to identify diabetic polyneuropathy are included in Multimedia Appendix 1.

From the cohort of 121,619, we excluded 20,806 individuals who had 2 or more months of disenrollment from the health plan in the 24 months before the first diabetic polyneuropathy diagnosis (ie, start of follow-up) to reduce the likelihood of missing healthcare use data. Among the remaining 100,813, we excluded 51,728 individuals who had evidence of a possible previous diagnosis of diabetic polyneuropathy. Finally, we also excluded 876 individuals who received hospice or palliative care during the 24 months before diagnosis because these individuals may be less likely to receive usual standard care and would be unlikely to benefit from early identification of lower extremity complications. The resulting analytic cohort included 48,209 adults with newly diagnosed diabetic polyneuropathy.

Identification of Lower Extremity Events and Predictors

We created a composite time-to-event outcome that included the 4 most common diabetic polyneuropathy-related lower extremity complications: foot ulceration (77.4%), osteomyelitis (6.53%), gangrene (5.46%), and nontraumatic lower extremity amputation (10.61%). We used evidence from earlier literature and clinician review to determine the diagnoses to be included [21-26]. A comprehensive list of codes used to identify lower extremity events is included in Multimedia Appendix 2 . We relied on the earlier peer-reviewed literature to define a standard phase-out period to account for the resolution of treatment for each type of event in order to distinguish between new and ongoing events [24].

Identification of Candidate Predictors

A list of candidate covariates for predicting the outcome is presented in Multimedia Appendix 3. Drawing on the existing literature, we included several baseline characteristics (up to 24 months before the start of follow-up, ie, first diabetic polyneuropathy diagnosis) that were identified in previous studies as covariates associated with adverse diabetic polyneuropathy-associated events. These included self-reported demographics (age, sex, race, and ethnicity), clinical risk factors (HbA_1c levels, lipid levels, body mass index, comorbidity, blood pressure, use of specific medications [eg, insulin], smoking status, and alcohol use), specific comorbidities, (cardiovascular disease, peripheral artery disease, atrial fibrillation, heart disease, chronic pain, rheumatoid arthritis, sleep apnea, nondiabetic neuropathies, and previous falls), and other indicators of diabetes severity (number of different diabetes diagnoses, chronic kidney disease, cellulitis, diabetic retinopathy, stroke, Charcot foot, and previous diabetic polyneuropathy events) [12,13,17,26-28]. Several of these predictors, including laboratory results (blood pressure and cholesterol), use of health services (eg, durable medical equipment and diabetes medications), a Comorbidity Point Score [29] (based on the Centers for Medicaid and Medicare Services Hierarchical Condition Categories [score range: 0‐1014; scores >300 are rare]) and behavioral risk factors (eg, smoking and alcohol use) were not included in previous prediction algorithms used to estimate the risk of neuropathy-related outcomes [12].

For each covariate with missing values, we included a missing indicator variable and imputed missing values for continuous variables with the median value. Covariates with missing values are indicated as imputed in Multimedia Appendix 3. We note that baseline covariate values marked as unavailable are not included as “missing data” because we aim to predict outcomes using the type of data routinely available in health care databases, which are, by their nature, highly variable across patients and systems. Thus, we did not use advanced analytic methods to address bias concerns associated with missing data, as is typically warranted in causal inference problems (eg, multiple imputation).

Algorithm Development and Validation

We developed an algorithm to predict lower extremity complications during the 24 months following an initial diagnosis of diabetic polyneuropathy. The first observed date of a diabetic polyneuropathy diagnosis served as the index date for the algorithm; this timepoint was defined as the start of patient follow-up and the clinical decision point when the constructed algorithm would be applied in practice. The first observed lower extremity complication was identified as the event date. We administratively censored follow-up after the eighth quarter (91-day interval) of follow-up (ie, at approximately 2 years).

To develop the algorithm, we combined data from the 2 health systems into a single dataset, then randomly split the data into distinct training (38,569/48209; 80%) and testing (9640/48209; 20%) datasets [30]. Selection of a risk predictor using the super learner (SL) ensemble learning methodology [31] was based solely on the results obtained using the training dataset; the independent validation set was used to evaluate the resulting discrimination and calibration performance. SL is a general loss-based ensemble learning method that uses cross-validation to combine [32] multiple candidate risk predictors defined by ML algorithms (eg, random forest) or parametric (eg, logistic) models into a single risk predictor referred to as “super learner”. The approach is grounded in statistical theory and its practical performance was demonstrated in previous applications [33].

Following the general approach of Polley and van der Laan [34], we used SL to construct a single point estimator of the vector of the discrete-time conditional hazards. The detailed mathematical formulas relating to the approach can been found in Multimedia Appendix 4. Implementation was generated using the sl3 R package (R Foundation for Statistical Computing) [35] with 10-fold cross-validation and the L2 loss function.

Due to potential limited computing infrastructures available to generate real-time predictions based on ML algorithms (eg, random forest) in some clinical settings, we also implemented a simplified version of the prediction approach described above in which we restricted the library of candidate predictors to a main-term logistic regression using training data pooled across all 8 quarters. This is equivalent to a classical discrete-time survival model with a logit link function [36]. The resulting predictor is thus a simpler function of predicted values from a single logistic regression that can be easily initiated in real-world clinical settings. In addition to this hazard-based logistic regression estimator for the cumulative incidence at 24 months, we also implemented a naïve estimator of the same cumulative incidence using a simple complete-case logistic regression in which outcomes from patients that were right-censored due to death or disenrollment from the health plan occurring before 24 months were treated as missing. We compared the performance [37] of the 3 estimators of the previously-described cumulative risks at 24 months (hazard-based SL, hazard-based logistic regression, and naïve logistic regression) using the areas under the receiver operating characteristic curves (AUCs) to evaluate sensitivity and specificity, calibration plots (ie, plots of observed versus predicted risks), and standard measures of predictive accuracy for a diagnostic test including sensitivity, specificity, positive and negative predictive value, and number needed to evaluate outcomes over a range of risk thresholds [38].

Finally, to determine whether the SL predictions were consistent with previous predictive approaches, we also examined select patient characteristics identified as predictive of risk in earlier studies by quintiles of the SL risk estimates for the 9640 patients in the testing data set. Based on the results from previous studies, the following characteristics were examined: age, HbA_1c levels, race and ethnicity, sex, evidence of select comorbid conditions known to be associated with lower extremity risk (ie, chronic kidney disease, heart failure, and peripheral artery disease), and a history of diabetic polyneuropathy events. We also examined differences in rates of symptoms by risk score quintile in the subset of patients in the test dataset who had been screened for diabetic polyneuropathy during the 12 months before their diagnosis using a single-item questionnaire (3644/9590; 38%). Data were extracted and formatted using SAS (version 9.4; SAS Institute) and analyses were performed in R (version 3.4.4).

Ethical Considerations

This study was approved by the institutional review board at Kaiser Permanente Northern California. Kaiser Permanente Colorado ceded authority to the Kaiser Permanente Northern California Institutional Review Board. For this study, the requirement that informed consent and Health Insurance Portability and Accountability Act Privacy Rule authorization be obtained from study participants was waived. To protect the data and privacy of the patients included in our study, we limited the number of individuals who would have access to identifiable data. We removed personal identifiers from data that were transmitted outside each health system via a secure transfer site. Password-protected data were stored on the servers behind the firewalls maintained by each health system. Primary data were not distributed outside the 2 health care systems and only summary tables and figures were shared with external collaborators. In addition, we used randomly generated identifiers on all study documents, secured storage of digital data (computer files) on password-protected computers, and limited access to data with potentially identifying features to members of the research team working under the direction of the investigators for the duration of the project.

Baseline Characteristics of Patients Meeting the Inclusion Criteria

The cohort identification strategy is described in Figure 1; 48,209 adults newly diagnosed with diabetic polyneuropathy met the inclusion criteria for this study. At the time of their initial diabetic polyneuropathy diagnosis, characteristics of individuals in the training and testing datasets exhibited similar demographic and clinical characteristics. A comprehensive list of characteristics is presented in the Multimedia Appendix 3. Overall, the average age at the time of diabetic polyneuropathy diagnosis was 64 years (SD 12 years). In this cohort, 54% (25,828/48,209) were male and 52% (25,110/48,209) were of White race, 13% (6095/48,209) were of Asian race, 11% (5221/48,209) were of Black or African American race, 20% (9708/48,209) were of Hispanic or Latinx ethnicity, fewer than 2%(797/48,209) were of Native Hawaiian, Pacific Islander or Native American race and fewer than 3% (1278/48,209) had unknown race or ethnicity. The comorbidity point score [29,39], which was calculated using diagnosis records within 12 months previous the diabetic polyneuropathy diagnosis, was 36.91 (SD 31.05).

Among the 48,209 patients newly diagnosed with diabetic polyneuropathy in this study sample, 2327 (4.83%) developed a lower extremity complication during the 24-month follow-up period. A comparison of characteristics of those who did and those who did not develop a lower extremity complication during follow-up is presented in Multimedia Appendix 5.

**Figure 1.** Cohort identification and selection criteria.

Discrimination and Calibration

As shown in Figure 2, the AUC statistic for the more complex SL was 0.845 (95% CI 0.826-0.863), indicating very good discrimination between high- and low-risk patients. Figure 2A compares the ability of the model to distinguish between those at high and low risk compared with a perfect test and chance. The solid blue lines at the left and top borders represent a hypothetical test that perfectly distinguishes between high and low risk patients. The solid black curve displays the results of the predictive model, which correctly classifies patients with an event more often than it incorrectly misclassifies patients. The dashed line bisecting the graph represents a test that correctly classifies patients 50% of the time (i.e., by chance). Figure 2B displays the observed and predicted probability of an event over the 24 months following an initial DPN diagnosis. The dotted line represents a hypothetical model that perfectly predicts the event. The dashed line is the model-predicted risk at 24 months. The filled circles represent the number of actual events occurring over time.

The SL predictor correctly classified patients who experienced an event as being at higher risk more often than it misclassified patients who did not experience an event as being at higher risk. Also shown in Figure 3 is a comparison of the observed with the predicted probability of a lower extremity complication over the 24 months following an initial diabetic polyneuropathy diagnosis. The blue line with circles represents the percent of people with an event or the positive predictive value. The green line with diamonds represents the number needed to treat. The algorithm demonstrated a high level of accuracy in predicting risk relative to perfect prediction.

The receiver operation characteristics curve for a simplified SL approach that included only one learner (logistic regression) was performed similarly to the more complex SL approach; the AUC for the simplified SL approach was 0.817 (95% CI 0.797-0.837). The similar performance indicated that a simplified logistic regression approach may be a viable alternative approach for use in clinical practice. By contrast, a naïve logistic regression model underperformed relative to both the complex and simplified SL approaches, with an AUC of 0.804 (95% CI 0.783-0.825).

To evaluate the use of these estimators for clinical decision-making, we compared the sensitivity, specificity, and the number needed to evaluate for varying risk thresholds at which an alert would be issued for the simplified SL compared with the naïve regression approach, as shown in Multimedia Appendix 6. Relative to a naïve logistic regression approach, the simplified SL approach yielded greater precision, triggered fewer alerts, and required fewer patients to be evaluated.

Figure 3 shows the positive predictive value relative to the number of patients needed to be evaluated as a function of the decision threshold. Setting an alert to trigger when an individual’s estimated risk reaches 30%‐50% using the simplified SL approach would yield a positive predictive value between 70%‐80%. Furthermore, we estimate that fewer than 2 patients would need to be evaluated to identify one likely to develop a lower extremity complication.

**Figure 2.** Discrimination and calibration plots for the model predicting hazard of lower extremity events among adults newly diagnosed with diabetic peripheral neuropathy (n=48,209). (A) Comparison of the model’s ability to distinguish between those at high and low risk compared with a perfect test and chance. The solid blue lines at the left and top borders represent a hypothetical test that perfectly distinguishes between high- and low-risk patients. The solid black curve displays the results of the predictive model, which correctly classifies patients with an event more often than it incorrectly misclassifies patients. The dashed line bisecting the graph represents a test that correctly classifies patients 50% of the time (ie, by chance). (B) Display of the observed and predicted probability of an event over the 24 months following an initial diagnosis of diabetic polyneuropathy. The dotted line represents a hypothetical model that perfectly predicts the event. The dashed line is the model-predicted risk at 24 months. The filled circles represent the number of actual events occurring over time.

**Figure 3.** Positive predictive value and number needed to evaluate as a function of the decision threshold. The blue line with enclosed circles represents the percent with an event or the positive predictive value. The green line with enclosed diamonds represents the number needed to evaluate.

Principal Findings

In this study, we leveraged rich clinical data from a longitudinal electronic health record to estimate the risk of developing lower extremity complications (ie, foot ulceration, osteomyelitis, gangrene, or amputation) among adults newly diagnosed with diabetic polyneuropathy. The resulting ML-enabled algorithm predicted risk of lower extremity complications with a high level of discriminant validity (AUC=0.845, 95%CI 0.826-0.863) and calibration. We concluded that a clinician would have to evaluate fewer than 2 patients newly diagnosed with diabetic polyneuropathy to identify one who would have a lower extremity event within the next 2 years.

Comparison With Previous Work

Our algorithm performed better than a similar algorithm developed by Goyal et al [40] to identify infections in diabetic foot ulcers (AUC of 0.658). This difference may be due to our use of data from systems with a common data model, which reduces coding variability. Two other approaches reported accuracies for predicting diabetic foot and the severity of neuropathy that exceeded 0.9 [41,42] However, these studies used additional data, including clinical assessments and patient questionnaires that are not routinely documented in clinical practice.

Consistent with the earlier literature [12,13,40-43], our predictor identified several clinical subgroups that may be at higher risk for lower extremity complications, including individuals with poorer glycemic control, cardiovascular comorbidity, and previous lower extremity events. However, additional work will be needed to estimate both intended and unintended effects [44] of implementing the prediction algorithm in diverse care settings.

Strengths and Weaknesses

Compared with previously published studies, strengths of our prediction approach include the use of a shorter time horizon (ie, 2-year time horizon), internal validation using an independent testing dataset, the inclusion of risk factors that are commonly available in EMRs, the use of a common data model [45], and the focus on overall risk of all lower extremity events rather than a single outcome (eg, foot ulceration).

Nonetheless, our prediction approach featured several important weaknesses. First, the predictor may be vulnerable to bias due to variability in the frequency of clinical monitoring during follow-up (ie, informative interval censoring). Second, our earlier studies identified differences in rates of diagnosis based on demographic factors (eg, age, race, and socioeconomic factors) [10]. This finding suggests that the predictor may underestimate risk in underserved populations. Additional research will be needed to evaluate the potential harm associated with the use of risk scores as well as other potential biases resulting from interval censoring [44].

Future Directions

The diabetic polyneuropathy predictor developed using ML methods and electronic health data from a common data model demonstrated good calibration and a high level of predictive accuracy. Such a tool could be useful for identifying patients who might benefit from promising interventions. However, the use of this and similar prediction tools is limited by the availability of evidence-based practices and protocols to guide clinical decision making in response to risk information. Given the potential benefit of risk stratification of newly diagnosed patients, our results support the value of further research into how this might be implemented in clinical practice, including the potential unintended consequences of applying risk predictors in clinical practice and ensuring that these tools are applied equitably to the benefit of all patients.

Conclusions

Diabetic polyneuropathy is a complex condition. There is not always a clear way to perform risk stratification for lower extremity complications associated with this disorder based on easily identifiable characteristics [2,5]. We hypothesize that diabetic polyneuropathy predictive analytics may be especially useful for identifying patients with diabetic polyneuropathy who are at highest risk for lower extremity complications soon after an initial diagnosis [8,17]. Nonetheless, technology-based prediction tools are not a panacea for complex clinical management. Consistent, strong evidence from diverse datasets and health care systems will be needed to determine the use of these and other strategies for patients, providers, and health systems in the context of real-world clinical practice [46].

Acknowledgments

The authors are grateful to J David Powers, who received funding to develop the dataset at Kaiser Permanente Colorado, for providing the data used in this study. The authors also thank and acknowledge the contributions of Drs Rick Dlott, Henry Lew, Mitchell Motooka, and Anna Cosyleon for providing contextual information regarding the potential application of the algorithm at Kaiser Permanente Northern California, Hawaii, and Colorado. These individuals did not receive funding for their participation. The authors would also like to thank Neesha Joseph, MPP, and Cyril Osifo-Doe, BS, for assistance in preparing the manuscript for submission. This study was funded by the Garfield Memorial Fund (GF001034). Authors ASA, RN, and JAS acknowledge funding from the Diabetes Research and Engagement through Advanced Multilevel Science (DREAMS) Center for Diabetes Translational Research (2P30DK0928924) and Patient Centered Outcomes Research Institute (PCORI; ME-2018C1-10942). ASA, EAB, BC, JAS, LM, and RN acknowledge funding from a PCORI Assessment of Prevention, Diagnosis, and Treatment Options Program Award (CE-1304-7250; ClinicalTrials.gov NCT02056431). RN acknowledges funding from PCORI (ME-1403-12506), and ASA acknowledges funding from the National Institute on Aging (R33AG057806). The funders were not involved in the study design, data collection, analysis, or interpretation of the study results.

Data Availability

The data used in this analysis were obtained directly from the electronic health records of the study participants. Because of privacy issues, our institutional review board does not permit us to make these data publicly available. However, metadata are available from author RN upon reasonable request.

Authors' Contributions

ASA and RN performed the research and generated data, contributed to the discussion, and wrote the first draft of the manuscript. GE, EAB, BC, RWG, MH, JAS, CMT, LKG, and EK contributed to the discussion and reviewed and edited the manuscript. ASA, CL, GE, NSH, LM, and RN performed the research and reviewed and edited the manuscript. All authors approved the final version of the manuscript. RN is the guarantor of this work and, as such, had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Conflicts of Interest

BC receives editorial and research support from the American Academy of Neurology (AAN), consults for Dynamed, and performs medical legal consultations.

Multimedia Appendix 1

List of International Classification of Diseases, Ninth and Tenth Revisions codes used to identify individuals with diabetic polyneuropathy.

DOCX File, 15 KB

Multimedia Appendix 2

Procedures used to identify the onset of lower extremity complications among adults newly diagnosed with diabetic polyneuropathy.

DOCX File, 14 KB

Multimedia Appendix 3

Characteristics of the study cohort.

DOCX File, 34 KB

Multimedia Appendix 4

Detailed description of algorithm development.

DOCX File, 13 KB

Multimedia Appendix 5

Comparison of predictor characteristics by event type.

DOCX File, 31 KB

Multimedia Appendix 6

Statistical performance of different risk prediction approaches.

DOCX File, 15 KB

Gordois A, Scuffham P, Shearer A, Oglesby A, Tobian JA. The health care costs of diabetic peripheral neuropathy in the US. Diabetes Care. Jun 2003;26(6):1790-1795. [CrossRef] [Medline]
Callaghan BC, Cheng HT, Stables CL, Smith AL, Feldman EL. Diabetic neuropathy: clinical manifestations and current treatments. Lancet Neurol. Jun 2012;11(6):521-534. [CrossRef] [Medline]
Harding JL, Pavkov ME, Gregg EW, Burrows NR. Trends of nontraumatic lower-extremity amputation in end-stage renal disease and diabetes: United States, 2000-2015. Diabetes Care. Aug 2019;42(8):1430-1435. [CrossRef] [Medline]
Geiss LS, Li Y, Hora I, Albright A, Rolka D, Gregg EW. Resurgence of diabetes-related nontraumatic lower-extremity amputation in the young and middle-aged adult U.S. population. Diabetes Care. Jan 2019;42(1):50-54. [CrossRef] [Medline]
American Diabetes Association Professional Practice Committee. 12. Retinopathy, neuropathy, and foot care: standards of care in diabetes-2025. Diabetes Care. Jan 1, 2025;48(Supplement_1):S252-S265. [CrossRef] [Medline]
Zhou J, Ding S, Xu Y, Pan H. Effects of digital intelligent interventions on self-management of patients with diabetic foot: systematic review. J Med Internet Res. Mar 25, 2025;27:e64400. [CrossRef] [Medline]
Ragnarson Tennvall G, Apelqvist J. Prevention of diabetes-related foot ulcers and amputations: a cost-utility analysis based on Markov model simulations. Diabetologia. Nov 2001;44(11):2077-2087. [CrossRef] [Medline]
Adams AS, Callaghan B, Grant RW. Overcoming barriers to diabetic polyneuropathy management in primary care. Healthc (Amst). Dec 2017;5(4):171-173. [CrossRef] [Medline]
Fermawi SA, Tolson JP, Knapp SM, et al. Disparities in preventative diabetic foot examination. Semin Vasc Surg. Mar 2023;36(1):84-89. [CrossRef] [Medline]
Banerjee S, Kim E, Parker MM, Gilliam LK, Dlott R, Adams A. Clinical response to real-time patient-reported diabetic peripheral neuropathy symptoms. Perm J. 2019;23(18-180):18-180. [CrossRef] [Medline]
Moghtaderi A, Bakhshipour A, Rashidi H. Validation of Michigan neuropathy screening instrument for diabetic peripheral neuropathy. Clin Neurol Neurosurg. Jul 2006;108(5):477-481. [CrossRef] [Medline]
Monteiro-Soares M, Martins-Mendes D, Vaz-Carneiro A, Sampaio S, Dinis-Ribeiro M. Classification systems for lower extremity amputation prediction in subjects with active diabetic foot ulcer: a systematic review and meta-analysis. Diabetes Metab Res Rev. Oct 2014;30(7):610-622. [CrossRef] [Medline]
Hippisley-Cox J, Coupland C. Diabetes treatments and risk of amputation, blindness, severe kidney failure, hyperglycaemia, and hypoglycaemia: open cohort study in primary care. BMJ. Mar 30, 2016;352:i1450. [CrossRef] [Medline]
Wagner FW. The diabetic foot. Orthopedics. Jan 1987;10(1):163-172. [CrossRef] [Medline]
Lavery LA, Armstrong DG, Harkless LB. Classification of diabetic foot wounds. J Foot Ankle Surg. Nov 1996;35(6):528-531. [CrossRef]
Ince P, Abbas ZG, Lutale JK, et al. Use of the SINBAD classification system and score in comparing outcome of foot ulcer management on three continents. Diabetes Care. May 2008;31(5):964-967. [CrossRef] [Medline]
Sambyal N, Saini P, Syal R. A review of statistical and machine learning techniques for microvascular complications in type 2 diabetes. Curr Diabetes Rev. 2021;17(2):143-155. [CrossRef] [Medline]
Jaffe MG, Young JD. The Kaiser Permanente Northern California story: improving hypertension control from 44% to 90% in 13 years (2000 to 2013). J Clin Hypertens (Greenwich). Apr 2016;18(4):260-261. [CrossRef] [Medline]
Schmittdiel JA, Grumbach K, Selby JV. System-based participatory research in health care: an approach for sustainable translational research and quality improvement. Ann Fam Med. 2010;8(3):256-259. [CrossRef] [Medline]
Ross TR, Ng D, Brown JS, et al. The HMO research network virtual data warehouse: a public data model to support collaboration. EGEMS (Wash DC). 2014;2(1):1049. [CrossRef] [Medline]
Hoffman EM, Staff NP, Robb JM, St Sauver JL, Dyck PJ, Klein CJ. Impairments and comorbidities of polyneuropathy revealed by population-based analyses. Neurology (ECronicon). Apr 21, 2015;84(16):1644-1651. [CrossRef] [Medline]
Sohn MW, Stuck RM, Pinzur M, Lee TA, Budiman-Mak E. Lower-extremity amputation risk after charcot arthropathy and diabetic foot ulcer. Diabetes Care. Jan 2010;33(1):98-100. [CrossRef] [Medline]
Harrington C, Zagari MJ, Corea J, Klitenic J. A cost analysis of diabetic lower-extremity ulcers. Diabetes Care. Sep 2000;23(9):1333-1338. [CrossRef] [Medline]
Holzer SE, Camerota A, Martens L, Cuerdon T, Crystal-Peters J, Zagari M. Costs and duration of care for lower extremity ulcers in patients with diabetes. Clin Ther. 1998;20(1):169-181. [CrossRef] [Medline]
Sambamoorthi U, Tseng CL, Rajan M, Anjali T, Findley PA, Pogach L. Initial nontraumatic lower-extremity amputations among veterans with diabetes. Med Care. Aug 2006;44(8):779-787. [CrossRef] [Medline]
Bild DE, Selby JV, Sinnock P, Browner WS, Braveman P, Showstack JA. Lower-extremity amputation in people with diabetes. Epidemiology and prevention. Diabetes Care. Jan 1989;12(1):24-31. [CrossRef] [Medline]
Ylitalo KR, Sowers M, Heeringa S. Peripheral vascular disease and peripheral neuropathy in individuals with cardiometabolic clustering and obesity: National Health and Nutrition Examination Survey 2001-2004. Diabetes Care. Jul 2011;34(7):1642-1647. [CrossRef] [Medline]
Papanas N, Ziegler D. Risk factors and comorbidities in diabetic neuropathy: an update 2015. Rev Diabet Stud. 2015;12(1-2):48-62. [CrossRef] [Medline]
Escobar GJ, Gardner MN, Greene JD, Draper D, Kipnis P. Risk-adjusting hospital mortality using a comprehensive electronic record in an integrated health care delivery system. Med Care. May 2013;51(5):446-453. [CrossRef] [Medline]
Simon RM, Subramanian J, Li MC, Menezes S. Using cross-validation to evaluate predictive accuracy of survival risk classifiers based on high-dimensional data. Brief Bioinform. May 2011;12(3):203-214. [CrossRef] [Medline]
van der Laan MJ, Polley EC, Hubbard AE. Super learner. Stat Appl Genet Mol Biol. 2007;6:Article25. [CrossRef] [Medline]
Breiman L. Stacked regressions. Mach Learn. Jul 1996;24(1):49-64. [CrossRef]
Pirracchio R, Petersen ML, Carone M, Rigon MR, Chevret S, van der Laan MJ. Mortality prediction in intensive care units with the Super ICU Learner Algorithm (SICULA): a population-based study. Lancet Respir Med. Jan 2015;3(1):42-52. [CrossRef] [Medline]
Polley EC, Laan MJ. Super learning for right-censored data. In: van der Laan MJ, Rose S, editors. Targeted Learning: Causal Inference for Observational and Experimental Data. Springer; 2011:249-258. [CrossRef] ISBN: 9781441997821
Coyle J, Hejazi N, Malencia I, Phillips R, Sofrygin O. R/sl3: super machine learning with pipelines. GitHub. 2021. URL: https://github.com/tlverse/sl3 [Accessed 2025-05-19]
Cox DR. Regression models and life-tables. Journal of the Royal Statistical Society Series B. Jan 1, 1972;34(2):187-202. [CrossRef]
Alba AC, Agoritsas T, Walsh M, et al. Discrimination and calibration of clinical prediction models: users’ guides to the medical literature. JAMA. Oct 10, 2017;318(14):1377-1384. [CrossRef] [Medline]
Patterson BW, Engstrom CJ, Sah V, et al. Training and interpreting machine learning algorithms to evaluate fall risk after emergency department visits. Med Care. Jul 2019;57(7):560-566. [CrossRef] [Medline]
Escobar GJ, Plimier C, Greene JD, Liu V, Kipnis P. Multiyear rehospitalization rates and hospital outcomes in an integrated health care system. JAMA Netw Open. Dec 2, 2019;2(12):e1916769. [CrossRef] [Medline]
Goyal M, Reeves ND, Rajbhandari S, Ahmad N, Wang C, Yap MH. Recognition of ischaemia and infection in diabetic foot ulcers: dataset and techniques. Comput Biol Med. Feb 2020;117:103616. [CrossRef] [Medline]
Haque F, Bin Ibne Reaz M, Chowdhury MEH, et al. Performance analysis of conventional machine learning algorithms for diabetic sensorimotor polyneuropathy severity classification. Diagnostics (Basel). Apr 28, 2021;11(5):801. [CrossRef] [Medline]
Ferreira ACBH, Ferreira DD, Oliveira HC, Resende ICD, Anjos A, Lopes MHBDM. Competitive neural layer-based method to identify people with high risk for diabetic foot. Comput Biol Med. May 2020;120:103744. [CrossRef] [Medline]
Schäfer Z, Mathisen A, Svendsen K, Engberg S, Rolighed Thomsen T, Kirketerp-Møller K. Toward machine-learning-based decision support in diabetes care: a risk stratification study on diabetic foot ulcer and amputation. Front Med (Lausanne). 2020;7(601602):601602. [CrossRef] [Medline]
Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. Oct 25, 2019;366(6464):447-453. [CrossRef] [Medline]
Lee C, Jo B, Woo H, Im Y, Park RW, Park C. Chronic disease prediction using the common data model: development study. JMIR AI. Dec 22, 2022;1(1):e41030. [CrossRef] [Medline]
Liaw WR, Ramos Silva Y, Soltero EG, Krist A, Stotts AL. An assessment of how clinicians and staff members use a diabetes artificial intelligence prediction tool: mixed methods study. JMIR AI. May 29, 2023;2:e45032. [CrossRef] [Medline]

‎

AUC: area under the receiver operating characteristic curve

EMR: electronic medical record

ICD-10: International Classification of Diseases, Tenth Revision

ICD-9: International Classification of Diseases, Ninth Revision

ML: machine learning

SL: super learner

Edited by Leo Quinlan; submitted 02.05.24; peer-reviewed by Chenhao Fang, Martin C Berli, Soroosh Tayebi Arasteh, Suresh Munusamy; final revised version received 29.03.25; accepted 14.04.25; published 29.05.25.

© Alyce S Adams, Catherine Lee, Gabriel Escobar, Elizabeth A Bayliss, Brian Callaghan, Michael Horberg, Julie A Schmittdiel, Connie Trinacty, Lisa K Gilliam, Eileen Kim, Nima S Hejazi, Lin Ma, Romain Neugebauer. Originally published in JMIR Diabetes (https://diabetes.jmir.org), 29.5.2025.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Diabetes, is properly cited. The complete bibliographic information, a link to the original publication on https://diabetes.jmir.org/, as well as this copyright and license information must be included.

This paper is in the following e-collection/theme issue:

Estimating the Risk of Lower Extremity Complications in Adults Newly Diagnosed With Diabetic Polyneuropathy: Retrospective Cohort Study