The impact of heterogeneous distance functions on missing data imputation and classification performance. (May 2022)
- Record Type:
- Journal Article
- Title:
- The impact of heterogeneous distance functions on missing data imputation and classification performance. (May 2022)
- Main Title:
- The impact of heterogeneous distance functions on missing data imputation and classification performance
- Authors:
- Santos, Miriam Seoane
Abreu, Pedro Henriques
Fernández, Alberto
Luengo, Julián
Santos, João - Abstract:
- Abstract: This work performs an in-depth study of the impact of distance functions on K-Nearest Neighbours imputation of heterogeneous datasets. Missing data is generated at several percentages, on a large benchmark of 150 datasets (50 continuous, 50 categorical and 50 heterogeneous datasets) and data imputation is performed using different distance functions (HEOM, HEOM-R, HVDM, HVDM-R, HVDM-S, MDE and SIMDIST) and k values (1, 3, 5 and 7). The impact of distance functions on kNN imputation is then evaluated in terms of classification performance, through the analysis of a classifier learned from the imputed data, and in terms of imputation quality, where the quality of the reconstruction of the original values is assessed. By analysing the properties of heterogeneous distance functions over continuous and categorical datasets individually, we then study their behaviour over heterogeneous data. We discuss whether datasets with different natures may benefit from different distance functions and to what extent the component of a distance function that deals with missing values influences such choice. Our experiments show that missing data has a significant impact on distance computation and the obtained results provide guidelines on how to choose appropriate distance functions depending on data characteristics (continuous, categorical or heterogeneous datasets) and the objective of the study (classification or imputation tasks).
- Is Part Of:
- Engineering applications of artificial intelligence. Volume 111(2022)
- Journal:
- Engineering applications of artificial intelligence
- Issue:
- Volume 111(2022)
- Issue Display:
- Volume 111, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 111
- Issue:
- 2022
- Issue Sort Value:
- 2022-0111-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-05
- Subjects:
- Missing data -- Data imputation -- kNN -- Distance functions -- Heterogeneous data
Engineering -- Data processing -- Periodicals
Artificial intelligence -- Periodicals
Expert systems (Computer science) -- Periodicals
Ingénierie -- Informatique -- Périodiques
Intelligence artificielle -- Périodiques
Systèmes experts (Informatique) -- Périodiques
Artificial intelligence
Engineering -- Data processing
Expert systems (Computer science)
Periodicals
620.00285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09521976 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.engappai.2022.104791 ↗
- Languages:
- English
- ISSNs:
- 0952-1976
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3755.704500
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 21244.xml