Identification of Potential Type II Diabetes in a Large-Scale Chinese Population Using a Systematic Machine Learning Framework. (24th September 2020)
- Record Type:
- Journal Article
- Title:
- Identification of Potential Type II Diabetes in a Large-Scale Chinese Population Using a Systematic Machine Learning Framework. (24th September 2020)
- Main Title:
- Identification of Potential Type II Diabetes in a Large-Scale Chinese Population Using a Systematic Machine Learning Framework
- Authors:
- Xue, Mingyue
Su, Yinxia
Li, Chen
Wang, Shuxia
Yao, Hua - Other Names:
- Southerland Janet H. Academic Editor.
- Abstract:
- Abstract : Background . An estimated 425 million people globally have diabetes, accounting for 12% of the world's health expenditures, and the number continues to grow, placing a huge burden on the healthcare system, especially in those remote, underserved areas. Methods . A total of 584, 168 adult subjects who have participated in the national physical examination were enrolled in this study. The risk factors for type II diabetes mellitus (T2DM) were identified by p values and odds ratio, using logistic regression (LR) based on variables of physical measurement and a questionnaire. Combined with the risk factors selected by LR, we used a decision tree, a random forest, AdaBoost with a decision tree (AdaBoost), and an extreme gradient boosting decision tree (XGBoost) to identify individuals with T2DM, compared the performance of the four machine learning classifiers, and used the best-performing classifier to output the degree of variables' importance scores of T2DM. Results . The results indicated that XGBoost had the best performance (accuracy = 0.906, precision = 0.910, recall = 0.902, F ‐ 1 = 0.906, and AUC = 0.968 ). The degree of variables' importance scores in XGBoost showed that BMI was the most significant feature, followed by age, waist circumference, systolic pressure, ethnicity, smoking amount, fatty liver, hypertension, physical activity, drinking status, dietary ratio (meat to vegetables), drink amount, smoking status, and diet habit (oil loving). Conclusions .Abstract : Background . An estimated 425 million people globally have diabetes, accounting for 12% of the world's health expenditures, and the number continues to grow, placing a huge burden on the healthcare system, especially in those remote, underserved areas. Methods . A total of 584, 168 adult subjects who have participated in the national physical examination were enrolled in this study. The risk factors for type II diabetes mellitus (T2DM) were identified by p values and odds ratio, using logistic regression (LR) based on variables of physical measurement and a questionnaire. Combined with the risk factors selected by LR, we used a decision tree, a random forest, AdaBoost with a decision tree (AdaBoost), and an extreme gradient boosting decision tree (XGBoost) to identify individuals with T2DM, compared the performance of the four machine learning classifiers, and used the best-performing classifier to output the degree of variables' importance scores of T2DM. Results . The results indicated that XGBoost had the best performance (accuracy = 0.906, precision = 0.910, recall = 0.902, F ‐ 1 = 0.906, and AUC = 0.968 ). The degree of variables' importance scores in XGBoost showed that BMI was the most significant feature, followed by age, waist circumference, systolic pressure, ethnicity, smoking amount, fatty liver, hypertension, physical activity, drinking status, dietary ratio (meat to vegetables), drink amount, smoking status, and diet habit (oil loving). Conclusions . We proposed a classifier based on LR-XGBoost which used fourteen variables of patients which are easily obtained and noninvasive as predictor variables to identify potential incidents of T2DM. The classifier can accurately screen the risk of diabetes in the early phrase, and the degree of variables' importance scores gives a clue to prevent diabetes occurrence. … (more)
- Is Part Of:
- Journal of diabetes research. Volume 2020(2020)
- Journal:
- Journal of diabetes research
- Issue:
- Volume 2020(2020)
- Issue Display:
- Volume 2020, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 2020
- Issue:
- 2020
- Issue Sort Value:
- 2020-2020-2020-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-09-24
- Subjects:
- Diabetes -- Periodicals
Diabetes -- Pathophysiology -- Periodicals
Diabetes -- Prevention -- Periodicals
Diabetes -- Etiology -- Periodicals
Diabetes -- Epidemiology -- Periodicals
Diabetes -- Pathogenesis -- Periodicals
616.462005 - Journal URLs:
- https://www.hindawi.com/journals/jdr/ ↗
- DOI:
- 10.1155/2020/6873891 ↗
- Languages:
- English
- ISSNs:
- 2314-6745
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library HMNTS - ELD Digital store
- Ingest File:
- 14397.xml