TyG-er: An ensemble Regression Forest approach for identification of clinical factors related to insulin resistance condition using Electronic Health Records. (September 2019)
- Record Type:
- Journal Article
- Title:
- TyG-er: An ensemble Regression Forest approach for identification of clinical factors related to insulin resistance condition using Electronic Health Records. (September 2019)
- Main Title:
- TyG-er: An ensemble Regression Forest approach for identification of clinical factors related to insulin resistance condition using Electronic Health Records
- Authors:
- Bernardini, Michele
Morettini, Micaela
Romeo, Luca
Frontoni, Emanuele
Burattini, Laura - Abstract:
- Abstract: Background: Insulin resistance is an early-stage deterioration of Type 2 diabetes. Identification and quantification of insulin resistance requires specific blood tests; however, the triglyceride-glucose (TyG) index can provide a surrogate assessment from routine Electronic Health Record (EHR) data. Since insulin resistance is a multi-factorial condition, to improve its characterisation, this study aims to discover non-trivial clinical factors in EHR data to determine where the insulin-resistance condition is encoded. Methods: We proposed a high-interpretable Machine Learning approach (i.e., ensemble Regression Forest combined with data imputation strategies), named TyG-er. We applied three different experimental procedures to test TyG-er reliability on the Italian Federation of General Practitioners dataset, named FIMMG_ obs dataset, which is publicly available and reflects the clinical use-case (i.e., not all laboratory exams are prescribed on a regular basis over time). Results: Results detected non-conventional clinical factors (i.e., uricemia, leukocytes, gamma-glutamyltransferase and protein profile) and provided novel insight into the best combination of clinical factors for detecting early glucose tolerance deterioration. The robustness of these extracted clinical factors was confirmed by the high agreement (from 0.664 to 0.911 of Lin's correlation coefficient ( r c )) of the TyG-er approach among different experimental procedures. Moreover, the results ofAbstract: Background: Insulin resistance is an early-stage deterioration of Type 2 diabetes. Identification and quantification of insulin resistance requires specific blood tests; however, the triglyceride-glucose (TyG) index can provide a surrogate assessment from routine Electronic Health Record (EHR) data. Since insulin resistance is a multi-factorial condition, to improve its characterisation, this study aims to discover non-trivial clinical factors in EHR data to determine where the insulin-resistance condition is encoded. Methods: We proposed a high-interpretable Machine Learning approach (i.e., ensemble Regression Forest combined with data imputation strategies), named TyG-er. We applied three different experimental procedures to test TyG-er reliability on the Italian Federation of General Practitioners dataset, named FIMMG_ obs dataset, which is publicly available and reflects the clinical use-case (i.e., not all laboratory exams are prescribed on a regular basis over time). Results: Results detected non-conventional clinical factors (i.e., uricemia, leukocytes, gamma-glutamyltransferase and protein profile) and provided novel insight into the best combination of clinical factors for detecting early glucose tolerance deterioration. The robustness of these extracted clinical factors was confirmed by the high agreement (from 0.664 to 0.911 of Lin's correlation coefficient ( r c )) of the TyG-er approach among different experimental procedures. Moreover, the results of the three experimental procedures outlined the predictive power of the TyG-er approach (up to a mean absolute error of 5.68 % and r c = 0.666, p < . 05 ). Conclusions: The TyG-er approach is able to carry information about the identification of the TyG index, strictly correlated with the insulin-resistance condition, while extracting the most relevant non-glycemic features from routine data. Highlights: TyG-er identifies clinical factors correlated to insulin resistance. TyG-er is based on Regression Forest with data imputation strategies. TyG-er relies on 80 non-glycemic predictors from Electronic Health Records. TyG-er indicates uricemia, leukocytes, γGT and protein profile as novel predictors. … (more)
- Is Part Of:
- Computers in biology and medicine. Volume 112(2019)
- Journal:
- Computers in biology and medicine
- Issue:
- Volume 112(2019)
- Issue Display:
- Volume 112, Issue 2019 (2019)
- Year:
- 2019
- Volume:
- 112
- Issue:
- 2019
- Issue Sort Value:
- 2019-0112-2019-0000
- Page Start:
- Page End:
- Publication Date:
- 2019-09
- Subjects:
- Insulin resistance -- Pre-diabetes -- Pattern recognition -- Random forest -- Laboratory screening -- Missing values
Medicine -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
610.285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00104825/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiomed.2019.103358 ↗
- Languages:
- English
- ISSNs:
- 0010-4825
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.880000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 11635.xml