Development and validation of the type 2 diabetes mellitus 10-year risk score prediction models from survey data

  • Gregor Stiglic
    Corresponding author at: Zitna ulica 15, 2000 Maribor, Slovenia.
    University of Maribor, Faculty of Health Sciences, Zitna ulica 15, 2000 Maribor, Slovenia

    University of Maribor, Faculty of Electrical Engineering and Computer Science, Koroska cesta 46, 2000 Maribor, Slovenia

    Usher Institute, University of Edinburgh, Old Medical School, Teviot Place, Edinburgh EH8 9AG, UK
    Search for articles by this author
  • Fei Wang
    Department of Population Health Sciences, Weill Cornell Medicine, 425 East 61 Street, New York, NY 10065
    Search for articles by this author
  • Aziz Sheikh
    Usher Institute, University of Edinburgh, Old Medical School, Teviot Place, Edinburgh EH8 9AG, UK
    Search for articles by this author
  • Leona Cilar
    University of Maribor, Faculty of Health Sciences, Zitna ulica 15, 2000 Maribor, Slovenia
    Search for articles by this author
Published:April 22, 2021DOI:


      • Large cross-national surveys represent a valuable source of data.
      • This paper validates 10-year T2DM risk models built on survey data.
      • Pooling country-level data to build global prediction models can significantly improve model performance.
      • Large variance between country-level models could indicate differences in the quality of collected data.



      In this paper, we demonstrate the development and validation of the 10-years type 2 diabetes mellitus (T2DM) risk prediction models based on large survey data.


      The Survey of Health, Ageing and Retirement in Europe (SHARE) data collected in 12 European countries using 53 variables representing behavioural as well as physical and mental health characteristics of the participants aged 50 or older was used to build and validate prediction models. To account for strongly unbalanced outcome variables, each instance was assigned a weight according to the inverse proportion of the outcome label when the regularized logistic regression model was built.


      A pooled sample of 16,363 individuals was used to build and validate a global regularized logistic regression model that achieved an area under the receiver operating characteristic curve of 0.702 (95% CI: 0.698–0.706). Additionally, we measured performance of local country-specific models where AUROC ranged from 0.578 (0.565–0.592) to 0.768 (0.749–0.787).


      We have developed and validated a survey-based 10-year T2DM risk prediction model for use across 12 European countries. Our results demonstrate the importance of re-calibration of the models as well as strengths of pooling the data from multiple countries to reduce the variance and consequently increase the precision of the results.


      To read this article in full you will need to make a payment


      Subscribe to Primary Care Diabetes
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


        • Zheng Y.
        • Ley S.H.
        • Hu F.B.
        Global aetiology and epidemiology of type 2 diabetes mellitus and its complications.
        Nature Reviews Endocrinology. 2018; 14: 88-98
        • Upadhyay J.
        • Polyzos S.A.
        • Perakakis N.
        • Thakkar B.
        • Paschou S.A.
        • Katsiki N.
        • Underwood P.
        • Park K.-H.
        • Seufert J.
        • Kang E.S.
        • Sternthal E.
        • Karagianniss A.
        • Mantzoros C.S.
        Pharmacotherapy of type 2 diabetes: an update.
        Metabolism. 2018; 78: 13-42
        • International Diabetes Federation
        IDF Diabetes Atlas.
        8th ed. 2017
        • Strain W.D.
        • Hope S.V.
        • Green A.
        • Kar P.
        • Valabhji J.
        • Sinclair A.J.
        Type 2 diabetes mellitus in older people: a brief statement of key principles of modern day management including the assessment of frailty. A national collaborative stakeholder initiative.
        Diabetic Medicine. 2018; 35: 838-845
        • Sinclair A.J.
        • Abdelhafiz A.H.
        • Forbes A.
        • Munshi M.
        Evidence-based diabetes care for older people with type 2 diabetes: a critical review.
        Diabetic Medicine. 2019; 36: 399-413
        • Eurostat
        Aging Europe: Looking at the Lives of Older People in the EU.
        Publications Office of the European Union, Luxembourg2019
        • Hill J.
        The older person with diabetes: considerations for care.
        British Journal of Community Nursing. 2019; 24: 160-164
        • Rodriguez-Sanchez B.
        • Cantarero-Prieto D.
        Socioeconomic differences in the associations between diabetes and hospital admission and mortality among older adults in Europe.
        Economics & Human Biology. 2019; 33: 89-100
        • Larsson S.C.
        • Wallin A.
        • Hakansson N.
        • Stackelberg O.
        • Back M.
        • Wolk A.
        Type 1 and type 2 diabetes mellitus and incidence of seven cardiovascular diseases.
        International Journal of Cardiology. 2018; 262: 66-70
        • SHARE
        SHARE—Survey of Health, Ageing and Retirement in Europe.
        • Dagogo-Jack S.
        Prevention begets prevention—lessons from the Da Qing Study.
        Nature Reviews Endocrinology. 2019; 15: 442-443
        • Yu D.
        • Zheng W.
        • Cai H.
        • Xiang Y.-B.
        • Li H.
        • Gao Y.-T.
        • Shu X.-O.
        Long-term diet quality and risk of type 2 diabetes among urban Chinese adults.
        Diabetes Care. 2018; 41: 723-730
        • Palacios O.M.
        • Kramer M.
        • Maki K.C.
        Diet and prevention of type 2 diabetes mellitus: beyond weight loss and exercise.
        Expert Review of Endocrinology & Metabolism. 2018; 14: 1-12
        • Weickert M.O.
        • Pfeiffer A.F.H.
        Impact of dietary fibre consumption on insulin resistance and the prevention of type 2 diabetes.
        J. Nutr. 2018; 148: 7-12
        • Schwingshackl L.
        • Hoffmann G.
        • Lampousi A.M.
        • Knüppel S.
        • Iqbal K.
        • Schwedhelm C.
        • et al.
        Food groups and risk of type 2 diabetes mellitus: a systematic review and meta-analysis of prospective studies.
        Eur. J. Epidemiol. 2017; 32: 363-375
        • Aune D.
        • Norat T.
        • Romundstad P.
        • Vatten L.J.
        Whole grain and refined grain consumption and the risk of type 2 diabetes: a systematic review and dose-response meta-analysis of cohort studies.
        Eur. J. Epidemiol. 2013; 28: 845-858
        • Blatt D.
        • Gostic C.L.
        Reducing the Risk of Diabetes and Metabolic Syndrome with Exercise and Physical Activity. Nutritional and Therapeutic Interventions for Diabetes and Metabolic Syndrome.
        2nd ed. 2018: 315-327
        • Wang Y.
        • Lee D.C.
        • Brellenthin A.G.
        • Eijsvogels T.M.
        • Sui X.
        • Church T.S.
        • Lavie C.J.
        • Blair S.N.
        Leisure-time running reduces the risk of incident type 2 diabetes.
        The American Journal of Medicine. 2019; 132: 1225-1232
        • De Souto Barreto P.
        • Cesari M.
        • Andrieu S.
        • Vellas B.
        • Rolland Y.
        Physical activity and incident chronic diseases: a longitudinal observational study in 16 European countries.
        American Journal of Preventive Medicine. 2017; 52: 373-378
        • Savikj M.
        • Gabrial B.M.
        • Alm P.S.
        • Smith J.
        • Caidahl K.
        • Bjornholm M.
        • Fritz T.
        • Krook A.
        • Zeirath J.R.
        • Wallberg-Henriksson H.
        Afternoon exercise is more efficacious than morning exercise at improving blood glucose levels in individuals with type 2 diabetes: a randomised crossover trial.
        Diabetologia. 2019; 62: 233-237
        • Jang J.-E.
        • Cho Y.
        • Lee B.W.
        • Shin E.-S.
        • Lee S.H.
        Effectiveness of exercise intervention in reducing body weight and glycosylated hemoglobin levels in patients with type 2 diabetes mellitus in Korea: a systematic review and meta-analysis.
        Diabetes & Metabolism. 2019; 43: 302-318
        • American Diabetes Association
        Prevention or delay of type 2 diabetes: standards of medical care in diabetes – 2018.
        Diabetes Care. 2018; 41: 51-54
        • Kaur H.
        • Kumari V.
        Predictive modelling and analytics for diabetes using a machine learning approach.
        Applied Computing and Informatics. 2020; (ahead-of-print)
        • Lai H.
        • Huang H.
        • Keshavjee K.
        • Guergachi A.
        • Gao X.
        Predictive models for diabetes mellitus using machine learning techniques.
        BMC Endocrine Disorders. 2019; 19: 1-9
        • Wu H.
        • Yang S.
        • Huang Z.
        • He J.
        • Wang X.
        Type 2 diabetes mellitus prediction model based on data mining.
        Informatics in Medicine Unlocked. 2018; 10: 100-107
        • Luo G.
        Automatically explaining machine learning prediction results: a demonstration on type 2 diabetes risk prediction.
        Health Information Science and Systems. 2016; 4: 1-9
        • Han L.
        • Luo S.
        • Yu J.
        • Pan L.
        • Chen S.
        Rule extraction from support vector machines using ensemble learning approach: an application for diagnosis of diabetes.
        IEEE J. Biomed. Health Inform. 2015; 9: 728-734
        • Razavian N.
        • Blecker S.
        • Schmidt A.M.
        • Smith-McLallen A.
        • Nigam S.
        • Sontag D.
        Population-level prediction of type 2 diabetes from claims data and analysis of risk factors.
        Big Data. 2015; 3: 277-287
        • Habibi S.
        • Ahmadi M.
        • Alizadeh S.
        Type 2 diabetes mellitus screening and risk factors using decision tree: results of data mining.
        Global J. Health Sci. 2015; 7: 304-310
        • Iyer A.
        • Jeyalatha S.
        • Sumbaly R.
        Diagnosis of diabetes using classification mining techniques.
        Int. J. Data Min. Knowl. Manage. Process. 2015; 5: 1-14
        • Grant S.W.
        • Collins G.S.
        • Nashef A.M.
        Statistical Primer: developing and validating a risk prediction model.
        Eur. J. Cardiothorac. Surg. 2018; 54: 203-208
        • Bonnett L.J.
        • Snell K.I.E.
        • Collins G.S.
        • Riley R.D.
        Guide to presenting clinical prediction models for use in clinical settings.
        BMJ. 2019; 365: 1-8
        • Stiglic G.
        • Watson R.
        • Cilar L.
        R you ready? Using the R programme for statistical analysis and graphics.
        Res. Nurs. Health. 2019; : 1-6
        • Mijderwijk H.
        • Steyerberg E.W.
        • Steiger H.J.
        • Fischer I.
        • Kamp M.A.
        Fundamentals of clinical prediction modeling for the neurosurgeon.
        Neurosurgery. 2019; 85: 302-311
        • Alssema M.
        • Vistisen D.
        • Heymans M.W.
        • Nijpels G.
        • Glümer C.
        • Zimmet P.Z.
        • Colagiuri S.
        The evaluation of screening and early detection strategies for type 2 diabetes and impaired glucose tolerance (DETECT-2) update of the Finnish diabetes risk score for prediction of incident type 2 diabetes.
        Diabetologia. 2011; 54: 1004-1012
        • Börsch-Supan A.
        Survey of Health, Ageing and Retirement in Europe (SHARE) Wave 7. Release Version: 7.0.0. SHARE-ERIC. Data Set.
        • Börsch-Supan A.
        • Brandt M.
        • Hunkler C.
        • Kneip T.
        • Korbmacher J.
        • Malter F.
        • Schaan B.
        • Stuck S.
        • Zuber S.
        Data resource profile: the Survey of Health, Ageing and Retirement in Europe (SHARE).
        International Journal of Epidemiology. 2013; 42: 992-1001
        • Friedman J.
        • Hastie T.
        • Tibshirani R.
        glmnet: Lasso and elastic-net regularized generalized linear models.
        R package version. 2009; 1
        • Hastie T.
        • Tibshirani R.
        • Wainwright M.
        Statistical Learning With Sparsity: The Lasso And Generalizations.
        Chapman and Hall/CRC, 2015
        • R Core Team
        R: A language and Environment for Statistical Computing.
        (URL) R Foundation for Statistical Computing, Vienna, Austria2019
        • Van Calster B.
        • Nieboer D.
        • Vergouwe Y.
        • De Cock B.
        • Pencina M.J.
        • Steyerberg E.W.
        A calibration hierarchy for risk models was defined: from utopia to empirical data.
        Journal of Clinical Epidemiology. 2016; 74: 167-176
        • Steyerberg E.W.
        Clinical Prediction Models.
        Springer, New York2019
        • Collins G.S.
        • Reitsma J.B.
        • Altman D.G.
        • Moons K.G.
        Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement.
        BMC medicine. 2015; 13: 1
        • Stevens R.J.
        • Poppe K.K.
        Validation of clinical prediction models: what does the “calibration slope” really measure?.
        Journal of Clinical Epidemiology. 2020; 118: 93-99
        • Abbasi A.
        • et al.
        Prediction models for risk of developing type 2 diabetes: systematic literature search and independent external validation study.
        BMJ. 2012; 345: 1-16
        • Kengne A.P.
        • et al.
        Non-invasive risk scores for prediction of type 2 diabetes (EPIC-InterAct): a validation of existing models.
        Lancet Diabetes Endocrinol. 2014; 2: 19-29
        • Hippisley-Cox J.
        • Coupland C.
        • Robson J.
        • Sheikh A.
        • Brindle P.
        Predicting risk of type 2 diabetes in England and Wales: prospective derivation and validation of QDScore.
        BMJ. 2009; 338: 880
        • Siregar S.
        • Nieboer D.
        • Versteegh M.I.
        • Steyerberg E.W.
        • Takkenberg J.J.
        Methods for updating a risk prediction model for cardiac surgery: a statistical primer.
        Interactive cardiovascular and thoracic surgery. 2019; 28: 333-338
        • Börsch-Supan A.
        • Börsch-Supan M.
        • Weiss L.M.
        36 Dried blood spot samples and their validation. Health and socio-economic status over the life course: First results from SHARE Waves 6 and 7.
        2019: 349
        • Stiglic G.
        • Kocbek P.
        • Cilar L.
        • Fijacko N.
        • Stozer A.
        • Zaletel J.
        • Sheikh A.
        • Povalej Brzan P.
        Development of a screening tool using electronic health records for undiagnosed type 2 diabetes mellitus and impaired fasting glucose detection in the Slovenian population.
        Diabetic Medicine. 2018; 35: 640-649
        • Stiglic G.
        • Kocbek P.
        • Fijacko N.
        • Sheikh A.
        • Pajnkihar M.
        Challenges associated with missing data in electronic health records: a case study of a risk.
        • Foverskov E.
        • Glymour M.M.
        • Mortensen E.L.
        • Holm A.
        • Lange T.
        • Lund R.
        Education and cognitive aging: accounting for selection and confounding in the Danish Registry-SHARE data linkage.
        American Journal of Epidemiology. 2018; 187: 2423-2430