A comparison of classification methods across different data complexity scenarios and datasets. (15th April 2021)
- Record Type:
- Journal Article
- Title:
- A comparison of classification methods across different data complexity scenarios and datasets. (15th April 2021)
- Main Title:
- A comparison of classification methods across different data complexity scenarios and datasets
- Authors:
- Scholz, Michael
Wimmer, Tristan - Abstract:
- Abstract: Recent research assessed the performance of classification methods mainly on concrete datasets whose statistical characteristics are unknown or unreported. The performance furthermore is often determined by only one performance measure, such as the area under the receiver operating characteristic curve. The performance of several classification methods in four different complexity scenarios and on datasets described by five data characteristics is compared in this paper. Synthetical datasets are used to control their statistical characteristics and real datasets are used to verify our findings. The performance of each classification method is determined by six measures. The investigation reveals that heterogeneous classifiers perform best on average, bagged CART is especially recommendable for datasets with low dimensionality and high sample size, kernel-based classification methods perform very well especially with a polynomial kernel, but require a rather long time for training and a nearest shrunken neighbor classifier is recommendable in case of unbalanced datasets. These findings help researchers and practitioners finding an appropriate method for their binary classification problems. Highlights: We compare methods for binary classification on synthetic datasets. We generate data for four complexity scenarios and with five data characteristics. Heterogeneous ensembles perform best on average. Nearest shrunken centroids are recommendable for unbalanced trainingAbstract: Recent research assessed the performance of classification methods mainly on concrete datasets whose statistical characteristics are unknown or unreported. The performance furthermore is often determined by only one performance measure, such as the area under the receiver operating characteristic curve. The performance of several classification methods in four different complexity scenarios and on datasets described by five data characteristics is compared in this paper. Synthetical datasets are used to control their statistical characteristics and real datasets are used to verify our findings. The performance of each classification method is determined by six measures. The investigation reveals that heterogeneous classifiers perform best on average, bagged CART is especially recommendable for datasets with low dimensionality and high sample size, kernel-based classification methods perform very well especially with a polynomial kernel, but require a rather long time for training and a nearest shrunken neighbor classifier is recommendable in case of unbalanced datasets. These findings help researchers and practitioners finding an appropriate method for their binary classification problems. Highlights: We compare methods for binary classification on synthetic datasets. We generate data for four complexity scenarios and with five data characteristics. Heterogeneous ensembles perform best on average. Nearest shrunken centroids are recommendable for unbalanced training data. Bagged CART is recommendable for large training data with low dimensionality. … (more)
- Is Part Of:
- Expert systems with applications. Volume 168(2021)
- Journal:
- Expert systems with applications
- Issue:
- Volume 168(2021)
- Issue Display:
- Volume 168, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 168
- Issue:
- 2021
- Issue Sort Value:
- 2021-0168-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-04-15
- Subjects:
- Binary classification -- Classification methods -- Performance comparison -- Data characteristics
Expert systems (Computer science) -- Periodicals
Systèmes experts (Informatique) -- Périodiques
Electronic journals
006.33 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09574174 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.eswa.2020.114217 ↗
- Languages:
- English
- ISSNs:
- 0957-4174
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3842.004220
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 15538.xml