Analysis of main risk factors causing stroke in Shanxi Province based on machine learning models. (2021)
- Record Type:
- Journal Article
- Title:
- Analysis of main risk factors causing stroke in Shanxi Province based on machine learning models. (2021)
- Main Title:
- Analysis of main risk factors causing stroke in Shanxi Province based on machine learning models
- Authors:
- Liu, Junjie
Sun, Yiyang
Ma, Jing
Tu, Jiachen
Deng, Yuhui
He, Ping
Li, Rongshan
Hu, Fengyun
Huang, Huaxiong
Zhou, Xiaoshuang
Xu, Shixin - Abstract:
- Abstract: Background: In China, stroke has been the first leading cause of death in recent years. It is a major cause of long-term physical and cognitive impairment, which bring great pressure on the National Public Health System. On the other hand, China is a big country, evaluation of the risk of getting stroke is important for the prevention and treatment of stroke in China. Methods: A data set with 2000 hospitalized stroke patients in 2018 and 27583 residents during the year 2017 to 2020 is analyzed in this study. With the cleaned data, three models on stroke risk levels are built by using machine learning methods. The importance of "8+2" factors from China National Stroke Prevention Project (CSPP) is evaluated via decision tree and random forest models. The importance of more detailed features and their SHAP 2 values are evaluated and ranked via random forest model. Furthermore, a logistic regression model is applied to evaluate the probability of getting stroke for different risk levels. Results: Among all "8+2" risk factors of getting stroke, the decision tree model reveals that top three factors are Hypertension (0.4995), 3 Physical Inactivity (0.08486) and Diabetes Mellitus (0.07889), and the random forest model shows that top three factors are Hypertension (0.3966), Hyperlipidemia (0.1229) and Physical Inactivity (0.1146). In addition to "8+2" factors the importance of features for lifestyle information, demographic information and medical measurement are evaluatedAbstract: Background: In China, stroke has been the first leading cause of death in recent years. It is a major cause of long-term physical and cognitive impairment, which bring great pressure on the National Public Health System. On the other hand, China is a big country, evaluation of the risk of getting stroke is important for the prevention and treatment of stroke in China. Methods: A data set with 2000 hospitalized stroke patients in 2018 and 27583 residents during the year 2017 to 2020 is analyzed in this study. With the cleaned data, three models on stroke risk levels are built by using machine learning methods. The importance of "8+2" factors from China National Stroke Prevention Project (CSPP) is evaluated via decision tree and random forest models. The importance of more detailed features and their SHAP 2 values are evaluated and ranked via random forest model. Furthermore, a logistic regression model is applied to evaluate the probability of getting stroke for different risk levels. Results: Among all "8+2" risk factors of getting stroke, the decision tree model reveals that top three factors are Hypertension (0.4995), 3 Physical Inactivity (0.08486) and Diabetes Mellitus (0.07889), and the random forest model shows that top three factors are Hypertension (0.3966), Hyperlipidemia (0.1229) and Physical Inactivity (0.1146). In addition to "8+2" factors the importance of features for lifestyle information, demographic information and medical measurement are evaluated via random forest model. It shows that top five features are Systolic Blood Pressure (SBP) (0.3670), Diastolic Blood Pressure (DBP) (0.1541), Physical Inactivity (0.0904), Body Mass Index (BMI) (0.0721) and Fasting Blood Glucose (FBG)(0.0531). SHAP values show that DBP, Physical Inactivity, SBP, BMI, Smoking, FBG, and Triglyceride(TG) are positively correlated to the risk of getting stroke. High-density Lipoprotein (HDL) is negatively correlated to the risk of getting stroke. Combining with the data of 2000 hospitalized stroke patients, the logistic regression model shows that the average probabilities of getting stroke are 7 . 20 % ± 0 . 55 % 4 for the low-risk level patients, 19 . 02 % ± 0 . 94 % for the medium-risk level patients and 83 . 89 % ± 0 . 97 % for the high-risk level patients. Conclusion: Based on the census data from Shanxi Province, we investigate stroke risk factors and their ranking. It shows that Hypertension, Physical Inactivity, and Overweight are ranked as the top three high stroke risk factors in Shanxi. The probability of getting a stroke is also estimated through our interpretable machine learning methods. … (more)
- Is Part Of:
- Informatics in medicine unlocked. Volume 26(2022)
- Journal:
- Informatics in medicine unlocked
- Issue:
- Volume 26(2022)
- Issue Display:
- Volume 26, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 26
- Issue:
- 2022
- Issue Sort Value:
- 2022-0026-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2021
- Subjects:
- Stroke -- Machine learning -- Risk factor ranking -- SHAP value
Medical informatics -- Periodicals
610.285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/23529148/ ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.imu.2021.100712 ↗
- Languages:
- English
- ISSNs:
- 2352-9148
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 21061.xml