An extensive analysis of the interaction between missing data types, imputation methods, and supervised classifiers. (15th December 2017)
- Record Type:
- Journal Article
- Title:
- An extensive analysis of the interaction between missing data types, imputation methods, and supervised classifiers. (15th December 2017)
- Main Title:
- An extensive analysis of the interaction between missing data types, imputation methods, and supervised classifiers
- Authors:
- Garciarena, Unai
Santana, Roberto - Abstract:
- Highlights: Interactions between missing data, imputation method and classifier are investigated. Interaction between imputation method complexity and classifier can be deduced. Complex imputation provides superior consistent results over simple methods. Certain combinations of the three components studied produce distinguished behaviors. Different behaviors of the imputers for varying amount of missingness are reported. Abstract: When applying data-mining techniques to real-world data, we often find ourselves facing observations that have no value recorded for some attributes. This can be caused by several phenomena, such as a machine's incapability to record certain characteristics or a person refusing to answer a question in a poll. Depending on that motivation, values gone missing may follow one kind of pattern or another, or describe no regularity at all. One approach to palliate the effect of missing data on machine learning tasks is to replace the missing observations. Imputation algorithms attempt to calculate a value for a missing gap, using information associated with it, i.e., the attribute and/or other values in the same observation. While several imputation methods have been proposed in the literature, few works have addressed the question of the relationship between the type of missing data, the choice of the imputation method, and the effectiveness of classification algorithms that used the imputed data. In this paper we address the relationship among theseHighlights: Interactions between missing data, imputation method and classifier are investigated. Interaction between imputation method complexity and classifier can be deduced. Complex imputation provides superior consistent results over simple methods. Certain combinations of the three components studied produce distinguished behaviors. Different behaviors of the imputers for varying amount of missingness are reported. Abstract: When applying data-mining techniques to real-world data, we often find ourselves facing observations that have no value recorded for some attributes. This can be caused by several phenomena, such as a machine's incapability to record certain characteristics or a person refusing to answer a question in a poll. Depending on that motivation, values gone missing may follow one kind of pattern or another, or describe no regularity at all. One approach to palliate the effect of missing data on machine learning tasks is to replace the missing observations. Imputation algorithms attempt to calculate a value for a missing gap, using information associated with it, i.e., the attribute and/or other values in the same observation. While several imputation methods have been proposed in the literature, few works have addressed the question of the relationship between the type of missing data, the choice of the imputation method, and the effectiveness of classification algorithms that used the imputed data. In this paper we address the relationship among these three factors. By constructing a benchmark of hundreds of databases containing different types of missing data, and applying several imputation methods and classification algorithms, we empirically show that an interaction between imputation methods and supervised classification can be deduced. Besides, differences in terms of classification performance for the same imputation method in different missing data patterns have been found. This points to the convenience of considering the combined choice of the imputation method and the classifier algorithm according to the missing data type. … (more)
- Is Part Of:
- Expert systems with applications. Volume 89(2017)
- Journal:
- Expert systems with applications
- Issue:
- Volume 89(2017)
- Issue Display:
- Volume 89, Issue 2017 (2017)
- Year:
- 2017
- Volume:
- 89
- Issue:
- 2017
- Issue Sort Value:
- 2017-0089-2017-0000
- Page Start:
- 52
- Page End:
- 65
- Publication Date:
- 2017-12-15
- Subjects:
- Missing data -- Imputation methods -- Supervised classifiers -- Machine learning
Expert systems (Computer science) -- Periodicals
Systèmes experts (Informatique) -- Périodiques
Electronic journals
006.33 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09574174 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.eswa.2017.07.026 ↗
- Languages:
- English
- ISSNs:
- 0957-4174
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3842.004220
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 4634.xml