Performance Comparison of Recent Imputation Methods for Classification Tasks over Binary Data. Issue 1 (2nd January 2017)
- Record Type:
- Journal Article
- Title:
- Performance Comparison of Recent Imputation Methods for Classification Tasks over Binary Data. Issue 1 (2nd January 2017)
- Main Title:
- Performance Comparison of Recent Imputation Methods for Classification Tasks over Binary Data
- Authors:
- Ghorbani, Soroosh
Desmarais, Michel C. - Abstract:
- ABSTRACT: This paper evaluates the effect on the predictive accuracy of different models of two recently proposed imputation methods, namely missForest (MF) and Multiple Imputation based on Expectation-Maximization (MIEM), along with two other imputation methods: Sequential Hot-deck and Multiple Imputation based on Logistic Regression (MILR). Their effect is assessed over the classification accuracy of four different models, namely Tree Augmented Naive Bayes (TAN) which has received little attention, Naive Bayes (NB), Logistic Regression (LR), and Support Vector Machine (SVM) with Radial Basis Function (RBF) kernel. Experiments are conducted over fourteen binary datasets with large feature sets, and across a wide range of missing data rates (between 5 and 50%). The results from 10 fold cross-validations show that the performance of the imputation methods varies substantially between different classifiers and at different rates of missing values. The MIEM method is shown to generally give the best results for all the classifiers across all rates of missing data. While NB model does not benefit much from imputation compared to a no imputation baseline, LR and TAN are highly susceptible to gain from the imputation methods at higher rates of missing values. The results also show that MF works best with TAN, and Hot-deck degrades the predictive performance of SVM and NB models at high rates of missing values (over 30%). Detailed analysis of the imputation methods over theABSTRACT: This paper evaluates the effect on the predictive accuracy of different models of two recently proposed imputation methods, namely missForest (MF) and Multiple Imputation based on Expectation-Maximization (MIEM), along with two other imputation methods: Sequential Hot-deck and Multiple Imputation based on Logistic Regression (MILR). Their effect is assessed over the classification accuracy of four different models, namely Tree Augmented Naive Bayes (TAN) which has received little attention, Naive Bayes (NB), Logistic Regression (LR), and Support Vector Machine (SVM) with Radial Basis Function (RBF) kernel. Experiments are conducted over fourteen binary datasets with large feature sets, and across a wide range of missing data rates (between 5 and 50%). The results from 10 fold cross-validations show that the performance of the imputation methods varies substantially between different classifiers and at different rates of missing values. The MIEM method is shown to generally give the best results for all the classifiers across all rates of missing data. While NB model does not benefit much from imputation compared to a no imputation baseline, LR and TAN are highly susceptible to gain from the imputation methods at higher rates of missing values. The results also show that MF works best with TAN, and Hot-deck degrades the predictive performance of SVM and NB models at high rates of missing values (over 30%). Detailed analysis of the imputation methods over the different datasets is reported. Implications of these findings on the choice of an imputation method are discussed. … (more)
- Is Part Of:
- Applied artificial intelligence. Volume 31:Issue 1(2017)
- Journal:
- Applied artificial intelligence
- Issue:
- Volume 31:Issue 1(2017)
- Issue Display:
- Volume 31, Issue 1 (2017)
- Year:
- 2017
- Volume:
- 31
- Issue:
- 1
- Issue Sort Value:
- 2017-0031-0001-0000
- Page Start:
- 1
- Page End:
- 22
- Publication Date:
- 2017-01-02
- Subjects:
- Artificial intelligence -- Periodicals
006.3 - Journal URLs:
- http://www.tandfonline.com/toc/uaai20/current ↗
http://www.tandfonline.com/ ↗ - DOI:
- 10.1080/08839514.2017.1279046 ↗
- Languages:
- English
- ISSNs:
- 0883-9514
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 1571.650000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 320.xml