Machine Learning Enhances Diabetes Prediction Across Diverse Ethnicities

TL;DR:

  • Researchers employ machine learning to predict T2D incidence and prevalence across various ethnicities.
  • Machine learning technology offers non-invasive screening and early assessment for T2D.
  • The study uses UK Biobank data for training and Lifelines data for validation.
  • Questionnaire-based models outperform traditional methods, even across different ethnic groups.
  • Incorporating biomarkers enhances prediction accuracy, while physical data has a limited impact.
  • This breakthrough promises precise and cost-effective T2D risk assessment for all populations.

Main AI News:

In a groundbreaking study recently unveiled in eClinicalMedicine, researchers have harnessed the power of machine learning to revolutionize the prediction of diabetes mellitus type 2 (T2D) incidence and prevalence across diverse ethnic backgrounds. This innovative approach marks a significant step forward in healthcare, particularly for non-white individuals facing unique challenges related to early T2D detection and its associated consequences.

Screening and prediction technologies play a pivotal role in the timely identification and management of T2D, especially within non-white populations. These groups grapple with a complex interplay of factors that accelerate the onset of diabetes, amplifying its health repercussions.

The emergence of machine learning-based technology has ushered in a new era of non-invasive screening, enabling preliminary assessments and referrals, ultimately fostering population health while curbing healthcare expenditures.

The Study Unveiled

In this comprehensive study, researchers have meticulously crafted prediction models for T2D incidence and prevalence based on carefully designed questionnaires. These models were developed using data from the United Kingdom Biobank (UKBB) for training purposes and were subsequently applied to Lifelines study data for validation, catering to both white and non-white individuals.

The heart of these questionnaire-based algorithms lies in their training on UKBB’s white population data. Their clinical potential was rigorously compared with two other models, which incorporated additional variables such as physical measurements and biological markers, as well as gold-standard models for clinical risk assessment in predicting T2D occurrences. Logistic regression modeling was employed as the cornerstone for predicting T2D incidence and prevalence.

The training dataset comprised a substantial cohort of white individuals from the UKBB study, while validation included individuals from five non-white ethnicities using Lifelines data. Rigorous feature selection was conducted during model development. The predictive accuracy of the models was measured using the area under the receiver operating characteristic (ROC) curve (AUC), accompanied by sensitivity analyses to gauge their clinical utility.

Additionally, a reclassification analysis was performed, comparing the questionnaire-only prediction models to those incorporating biomarkers and physical and clinical T2D risk assessment tools.

Diagnosis and Thresholds

The diagnosis of T2D within the training cohort participants relied on self-reported data, clinician-based T2D diagnoses, or hospital records employing the International Classification of Diseases, ninth revision (ICD-9) diagnostic codes. Validation cohort participants were categorized as either having incident or prevalent type 2 diabetes, based on self-reports.

Following the guidelines set by the National Institute for Health and Care Excellence (NICE), thresholds for “potentially undiagnosed” T2D were defined as blood glucose levels exceeding 7.0 mmol/L or glycated hemoglobin (HbA1c) levels surpassing 48 mmol/mol. To mitigate bias in prevalence studies, individuals with “potentially undiagnosed” T2D were excluded from the analysis. Furthermore, incident T2D patients with over eight years until diagnosis and individuals who did not develop T2D but did not return to the assessment center after eight years were also excluded.

Validation and Comparison

In addition to developing and validating questionnaire-based models, researchers put the non-laboratory clinically concise Finnish Diabetes Risk Score (FINDRISC) and the clinical Australian T2D Risk Assessment Tool (AUSDRISK) to the test. These tools rely on nine and 13 features to predict incident T2D, spanning medical history, demographics, lifestyle, and anthropometrics.

Results Unveiled

The study encompassed 67,083 individuals for assessing T2D incidence and a staggering 631,748 individuals for evaluating T2D prevalence. Notably, T2D incidence and prevalence rates varied significantly between non-white and white individuals. Non-White populations exhibited a 4.0-fold higher prevalence (ranging from 12% to 23%) and 0.5 to 3.0-fold greater incidence (between 1.4% and 8.2%) compared to the white UKBB population (6.00% and 2.80%, respectively).

Conversely, Lifelines demonstrated a lower T2D prevalence (two percent) and incidence (two percent) compared to the white UKBB population, partly attributed to age disparities between the two groups.

In the white UKBB sample, the algorithms demonstrated remarkable accuracy, correctly predicting T2D prevalence with an AUC of 0.9 and incidence over an eight-year span with an AUC of 0.9.

Impressively, these models replicated their success in the Lifelines external validation, boasting AUC values of 0.8 and 0.9 for incidence and prevalence, respectively.

Across various ethnicities, both machine learning-based models consistently delivered robust results, with AUC values ranging between 0.86 and 0.89 for prevalence and between 0.82 and 0.88 for the incidence of T2D.

Outperforming traditional non-laboratory techniques, the models effectively reclassified nearly 3,000 additional cases. The incorporation of biological markers, though not physical data, amplified model performance.

BMI and the number of drugs used emerged as pivotal factors in both prevalence and incidence models, ranking among the top three contributing characteristics. Additionally, incidence models introduced a sedentarism element, gauging time spent watching television (TV).

In the realm of forecasting T2D prevalence and incidence across diverse demographics, Lifelines’ questionnaire-based ML models outshone FINDRISC and AUSDRISK. These questionnaire-only models achieved an enviable balance between sensitivity and specificity, positive predictive value (PPV), and negative predictive value (NPV) for all populations. Sensitivity-specificity equilibrium improved in models that incorporated biomarkers, resulting in enhanced PPV across various groups.

With statistical significance observed across white, Caribbean, other, and South Asian populations, the models adeptly categorized more instances than clinically validated prediction techniques. Notably, the inclusion of physical data consistently improved the ranking of incidents in Lifelines. Biomarker-based models outperformed clinical methods in nearly every case.

Conclusion:

This pioneering study has demonstrated that machine learning models, derived from the UK Biobank, hold the key to accurately predicting T2D prevalence and incidence across all ethnicities, including non-white individuals. These models not only outperform existing methods but also offer a precise, scalable, and cost-effective strategy for identifying positive cases and forecasting risk. This breakthrough underscores the transformative potential of machine learning in shaping the future of healthcare.

Source