Addressing the problem of missing data in decision tree modeling. Issue 3 (17th February 2018)
- Record Type:
- Journal Article
- Title:
- Addressing the problem of missing data in decision tree modeling. Issue 3 (17th February 2018)
- Main Title:
- Addressing the problem of missing data in decision tree modeling
- Authors:
- Haji-Maghsoudi, Saiedeh
Rastegari, Azam
Garrusi, Behshid
Baneshi, Mohammad Reza - Abstract:
- ABSTRACT: Tree-based models (TBMs) can substitute missing data using the surrogate approach (SUR). The aim of this study is to compare the performance of statistical imputation against the performance of SUR in TBMs. Employing empirical data, a TBM was constructed. Thereafter, 10%, 20%, and 40% of variable values appeared as the first split was deleted, and imputed with and without the use of outcome variables in the imputation model (IMP− and IMP+). This was repeated one thousand times. Absolute relative bias above 0.10 was defined as sever (SARB). Subsequently, in a series of simulations, the following parameters were changed: the degree of correlation among variables, the number of variables truly associated with the outcome, and the missing rate. At a 10% missing rate, the proportion of times SARB was observed in either SUR or IMP− was two times higher than in IMP+ (28% versus 13%). When the missing rate was increased to 20%, all these proportions were approximately doubled. Irrespective of the missing rate, IMP+ was about 65% less likely to produce SARB than SUR. Results of IMP− and SUR were comparable up to a 20% missing rate. At a high missing rate, IMP− was 76% more likely to provide SARB estimates. Statistical imputation of missing data and the use of outcome variable in the imputation model is recommended, even in the content of TBM.
- Is Part Of:
- Journal of applied statistics. Volume 45:Issue 3(2018)
- Journal:
- Journal of applied statistics
- Issue:
- Volume 45:Issue 3(2018)
- Issue Display:
- Volume 45, Issue 3 (2018)
- Year:
- 2018
- Volume:
- 45
- Issue:
- 3
- Issue Sort Value:
- 2018-0045-0003-0000
- Page Start:
- 547
- Page End:
- 557
- Publication Date:
- 2018-02-17
- Subjects:
- Tree -- missing -- surrogate -- imputation -- prediction
Statistics -- Periodicals
519.5 - Journal URLs:
- http://www.tandfonline.com/loi/cjas20 ↗
http://www.tandfonline.com/ ↗ - DOI:
- 10.1080/02664763.2017.1284184 ↗
- Languages:
- English
- ISSNs:
- 0266-4763
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4947.110000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 5530.xml