The impact of class imbalance in classification performance metrics based on the binary confusion matrix. (July 2019)
- Record Type:
- Journal Article
- Title:
- The impact of class imbalance in classification performance metrics based on the binary confusion matrix. (July 2019)
- Main Title:
- The impact of class imbalance in classification performance metrics based on the binary confusion matrix
- Authors:
- Luque, Amalia
Carrasco, Alejandro
Martín, Alejandro
de las Heras, Ana - Abstract:
- Highlights: Imbalance coefficient fosters measuring imbalance. Geometric Mean and Bookmaker Informedness constitute the best unbiased metrics. Matthews Correlation Coefficient is the best option for error consideration. The concept of Class Balance Accuracy can be extended to other metrics. Abstract: A major issue in the classification of class imbalanced datasets involves the determination of the most suitable performance metrics to be used. In previous work using several examples, it has been shown that imbalance can exert a major impact on the value and meaning of accuracy and on certain other well-known performance metrics. In this paper, our approach goes beyond simply studying case studies and develops a systematic analysis of this impact by simulating the results obtained using binary classifiers. A set of functions and numerical indicators are attained which enables the comparison of the behaviour of several performance metrics based on the binary confusion matrix when they are faced with imbalanced datasets. Throughout the paper, a new way to measure the imbalance is defined which surpasses the Imbalance Ratio used in previous studies. From the simulation results, several clusters of performance metrics have been identified that involve the use of Geometric Mean or Bookmaker Informedness as the best null-biased metrics if their focus on classification successes (dismissing the errors) presents no limitation for the specific application where they are used. However,Highlights: Imbalance coefficient fosters measuring imbalance. Geometric Mean and Bookmaker Informedness constitute the best unbiased metrics. Matthews Correlation Coefficient is the best option for error consideration. The concept of Class Balance Accuracy can be extended to other metrics. Abstract: A major issue in the classification of class imbalanced datasets involves the determination of the most suitable performance metrics to be used. In previous work using several examples, it has been shown that imbalance can exert a major impact on the value and meaning of accuracy and on certain other well-known performance metrics. In this paper, our approach goes beyond simply studying case studies and develops a systematic analysis of this impact by simulating the results obtained using binary classifiers. A set of functions and numerical indicators are attained which enables the comparison of the behaviour of several performance metrics based on the binary confusion matrix when they are faced with imbalanced datasets. Throughout the paper, a new way to measure the imbalance is defined which surpasses the Imbalance Ratio used in previous studies. From the simulation results, several clusters of performance metrics have been identified that involve the use of Geometric Mean or Bookmaker Informedness as the best null-biased metrics if their focus on classification successes (dismissing the errors) presents no limitation for the specific application where they are used. However, if classification errors must also be considered, then the Matthews Correlation Coefficient arises as the best choice. Finally, a set of null-biased multi-perspective Class Balance Metrics is proposed which extends the concept of Class Balance Accuracy to other performance metrics. … (more)
- Is Part Of:
- Pattern recognition. Volume 91(2019:Jul.)
- Journal:
- Pattern recognition
- Issue:
- Volume 91(2019:Jul.)
- Issue Display:
- Volume 91 (2019)
- Year:
- 2019
- Volume:
- 91
- Issue Sort Value:
- 2019-0091-0000-0000
- Page Start:
- 216
- Page End:
- 231
- Publication Date:
- 2019-07
- Subjects:
- Classification -- Performance measures -- Imbalanced datasets -- Class Balance Metrics
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2019.02.023 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 9741.xml