Inter operator variability of machine learning researchers predicting all-cause mortality in patients admitted to intensive care unit. (14th October 2021)
- Record Type:
- Journal Article
- Title:
- Inter operator variability of machine learning researchers predicting all-cause mortality in patients admitted to intensive care unit. (14th October 2021)
- Main Title:
- Inter operator variability of machine learning researchers predicting all-cause mortality in patients admitted to intensive care unit
- Authors:
- Jones, Y
Cleland, J
Li, C
Pellicori, P
Friday, J - Abstract:
- Abstract: Background: The number of publications using machine learning (ML) to predict cardiovascular outcomes and identify clusters of patients at greater risk has risen dramatically in recent years. However, research papers which use ML often fail to provide sufficient information about their algorithms to enable results to be replicated by others in the same or different datasets. Aim: To test the reproducibility of results from ML algorithms given three different levels of information commonly found in publications: model type alone, a description of the model, and complete algorithm. Methods: MIMIC-III is a healthcare dataset comprising detailed information from over 60, 000 intensive care unit (ICU) admissions from the Beth Israel Deaconess Medical Centre between 2001 and 2012. Access is available to everyone pending approval and completion of a short training course. Using this dataset, three models for predicting all-cause in-hospital mortality were created, two from a PhD student working in ML, and one from an existing research paper which used the same dataset and provided complete model information. A second researcher (a PhD student in ML and cardiology) was given the same dataset and was tasked with reproducing their results. Initially, this second researcher was told what type of model was created in each case, followed by a brief description of the algorithms. Finally, the complete algorithms from each participant were provided. In all three scenarios,Abstract: Background: The number of publications using machine learning (ML) to predict cardiovascular outcomes and identify clusters of patients at greater risk has risen dramatically in recent years. However, research papers which use ML often fail to provide sufficient information about their algorithms to enable results to be replicated by others in the same or different datasets. Aim: To test the reproducibility of results from ML algorithms given three different levels of information commonly found in publications: model type alone, a description of the model, and complete algorithm. Methods: MIMIC-III is a healthcare dataset comprising detailed information from over 60, 000 intensive care unit (ICU) admissions from the Beth Israel Deaconess Medical Centre between 2001 and 2012. Access is available to everyone pending approval and completion of a short training course. Using this dataset, three models for predicting all-cause in-hospital mortality were created, two from a PhD student working in ML, and one from an existing research paper which used the same dataset and provided complete model information. A second researcher (a PhD student in ML and cardiology) was given the same dataset and was tasked with reproducing their results. Initially, this second researcher was told what type of model was created in each case, followed by a brief description of the algorithms. Finally, the complete algorithms from each participant were provided. In all three scenarios, recreated models were compared to original models using Area Under the Receiver Operating Characteristic Curve (AUC). Results: After excluding those younger than 18 years and events with missing or invalid entries, 21, 139 ICU admissions remained from 18, 094 patients between 2001 and 2012, including 2, 797 in-hospital deaths. Three models were produced: two Recurrent Neural Networks (RNNs) which differed significantly in internal weights and variables, and a Boosted Tree Classifier (BTC). The AUC of the first reproduced RNN matched that of the original RNN (Figure 1), however the second RNN and the BTC could not be reproduced given model type alone. As more information was provided about these algorithms, the results from the reproduced models matched the original results more closely. Conclusions: In order to create clinically useful ML tools with results that are reproducible and consistent, it is vital that researchers share enough detail about their models. Model type alone is not enough to guarantee reproducibility. Although some models can be recreated with limited information, this is not always the case, and the best results are found when the complete algorithm is shared. These findings have huge relevance when trying to apply ML in clinical practice. Funding Acknowledgement: Type of funding sources: None. … (more)
- Is Part Of:
- European heart journal. Volume 42(2021)Supplement 1
- Journal:
- European heart journal
- Issue:
- Volume 42(2021)Supplement 1
- Issue Display:
- Volume 42, Issue 1 (2021)
- Year:
- 2021
- Volume:
- 42
- Issue:
- 1
- Issue Sort Value:
- 2021-0042-0001-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-10-14
- Subjects:
- Artificial Intelligence (Machine Learning, Deep Learning)
Cardiology -- Periodicals
Heart -- Diseases -- Periodicals
616.12005 - Journal URLs:
- http://eurheartj.oxfordjournals.org/ ↗
http://ukcatalogue.oup.com/ ↗ - DOI:
- 10.1093/eurheartj/ehab724.3052 ↗
- Languages:
- English
- ISSNs:
- 0195-668X
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3829.717500
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 25611.xml