286Collider-stratification bias when estimating variable importance using Random Forests. (2nd September 2021)
- Record Type:
- Journal Article
- Title:
- 286Collider-stratification bias when estimating variable importance using Random Forests. (2nd September 2021)
- Main Title:
- 286Collider-stratification bias when estimating variable importance using Random Forests
- Authors:
- Long, Stephanie
Lefebvre, Genevieve
Schuster, Tibor - Abstract:
- Abstract: Background: Advances in causal inference have helped explain the longstanding birthweight and obesity paradoxes: selection bias due to conditioning on a collider variable i.e. collider-stratification bias (CSB). The lessons learned have critical implications for the interpretation of machine learning (ML), including decision trees and random forests (RFs), that implicitly condition on input variables. RFs are a popular approach for identifying important "predictors" from large data through variable importance, defined by the average decrease in prediction accuracy. While CSB has become a recognized concern when estimating exposure-outcome effects, knowledge of its impact on ML's variable importance measures (VIMs) is limited. Applying the causal inference framework, we investigated the accuracy of RFs' VIMs in data-mechanisms prone to CSB. Methods: A Monte Carlo simulation study was conducted, with binary outcome and collider variables generated from logistic models. Two exposure variables stochastically determined the outcome and a collider variable, independent of the outcome. VIMs from RFs were compared to the known causal relevance of the input variables on the outcome. Results: While variable importance of true exposure variables was not systematically affected by CSB, validity of VIMs can be affected, leading to erroneous selection of collider variables, causally independent of the outcome, as outcome predictors. Conclusions: In presence of CSB, VIMs are notAbstract: Background: Advances in causal inference have helped explain the longstanding birthweight and obesity paradoxes: selection bias due to conditioning on a collider variable i.e. collider-stratification bias (CSB). The lessons learned have critical implications for the interpretation of machine learning (ML), including decision trees and random forests (RFs), that implicitly condition on input variables. RFs are a popular approach for identifying important "predictors" from large data through variable importance, defined by the average decrease in prediction accuracy. While CSB has become a recognized concern when estimating exposure-outcome effects, knowledge of its impact on ML's variable importance measures (VIMs) is limited. Applying the causal inference framework, we investigated the accuracy of RFs' VIMs in data-mechanisms prone to CSB. Methods: A Monte Carlo simulation study was conducted, with binary outcome and collider variables generated from logistic models. Two exposure variables stochastically determined the outcome and a collider variable, independent of the outcome. VIMs from RFs were compared to the known causal relevance of the input variables on the outcome. Results: While variable importance of true exposure variables was not systematically affected by CSB, validity of VIMs can be affected, leading to erroneous selection of collider variables, causally independent of the outcome, as outcome predictors. Conclusions: In presence of CSB, VIMs are not valid measures of the causal relevance of variables and may mislead selection of truly important factors that affect the outcome. Key messages: ML must consider causal data-generating mechanisms otherwise it may lead to erroneous assessment of variable importance regarding outcome prediction. … (more)
- Is Part Of:
- International journal of epidemiology. Volume 50(2021)Supplement 1
- Journal:
- International journal of epidemiology
- Issue:
- Volume 50(2021)Supplement 1
- Issue Display:
- Volume 50, Issue 1 (2021)
- Year:
- 2021
- Volume:
- 50
- Issue:
- 1
- Issue Sort Value:
- 2021-0050-0001-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-09-02
- Subjects:
- Epidemiology -- Periodicals
614.4 - Journal URLs:
- http://ije.oxfordjournals.org/ ↗
http://ukcatalogue.oup.com/ ↗ - DOI:
- 10.1093/ije/dyab168.399 ↗
- Languages:
- English
- ISSNs:
- 0300-5771
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4542.244000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 19885.xml