A multiple combined method for rebalancing medical data with class imbalances. (July 2021)
- Record Type:
- Journal Article
- Title:
- A multiple combined method for rebalancing medical data with class imbalances. (July 2021)
- Main Title:
- A multiple combined method for rebalancing medical data with class imbalances
- Authors:
- Wang, Yun-Chun
Cheng, Ching-Hsue - Abstract:
- Abstract: Most classification algorithms assume that classes are in a balanced state. However, datasets with class imbalances are everywhere. The classes of actual medical datasets are imbalanced, severely impacting identification models and even sacrificing the classification accuracy of the minority class, even though it is the most influential and representative. The medical field has irreversible characteristics. Its tolerance rate for misjudgment is relatively low, and errors may cause irreparable harm to patients. Therefore, this study proposes a multiple combined method to rebalance medical data featuring class imbalances. The combined methods include (1) resampling methods (synthetic minority oversampling technique [SMOTE] and undersampling [US]), (2) particle swarm optimization (PSO), and (3) MetaCost. This study conducted two experiments with nine medical datasets to verify and compare the proposed method with the listing methods. A decision tree is used to generate decision rules for easy understanding of the research results. The results show that (1) the proposed method with ensemble learning can improve the area under a receiver operating characteristic curve (AUC), recall, precision, and F1 metrics; (2) MetaCost can increase sensitivity; (3) SMOTE can effectively enhance AUC; (4) US can improve sensitivity, F1, and misclassification costs in data with a high-class imbalance ratio; and (5) PSO-based attribute selection can increase sensitivity and reduce dataAbstract: Most classification algorithms assume that classes are in a balanced state. However, datasets with class imbalances are everywhere. The classes of actual medical datasets are imbalanced, severely impacting identification models and even sacrificing the classification accuracy of the minority class, even though it is the most influential and representative. The medical field has irreversible characteristics. Its tolerance rate for misjudgment is relatively low, and errors may cause irreparable harm to patients. Therefore, this study proposes a multiple combined method to rebalance medical data featuring class imbalances. The combined methods include (1) resampling methods (synthetic minority oversampling technique [SMOTE] and undersampling [US]), (2) particle swarm optimization (PSO), and (3) MetaCost. This study conducted two experiments with nine medical datasets to verify and compare the proposed method with the listing methods. A decision tree is used to generate decision rules for easy understanding of the research results. The results show that (1) the proposed method with ensemble learning can improve the area under a receiver operating characteristic curve (AUC), recall, precision, and F1 metrics; (2) MetaCost can increase sensitivity; (3) SMOTE can effectively enhance AUC; (4) US can improve sensitivity, F1, and misclassification costs in data with a high-class imbalance ratio; and (5) PSO-based attribute selection can increase sensitivity and reduce data dimension. Finally, we suggest that the dataset with an imbalanced ratio >9 must use the US results to make the decision. As the imbalanced ratio is < 9, the decision-maker can simultaneously consider the results of SMOTE and US to identify the best decision. Highlights: The proposed combined-multiple method includes SMOTE, US, PSO attribute selection, and MetaCost. The results show that ensemble learning can improve AUC, recall, precision, and F1 metrics. The experimental results show MetaCost increasing sensitivity, and SMOTE can effectively enhance AUC. US can improve sensitivity, F1, and misclassification costs in data with high-class imbalance ratio. PSO-based attribute selection can increase sensitivity and reduce data dimension. … (more)
- Is Part Of:
- Computers in biology and medicine. Volume 134(2021)
- Journal:
- Computers in biology and medicine
- Issue:
- Volume 134(2021)
- Issue Display:
- Volume 134, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 134
- Issue:
- 2021
- Issue Sort Value:
- 2021-0134-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-07
- Subjects:
- Class imbalance -- Synthetic minority oversampling technique -- Particle swarm optimization -- MetaCost
Medicine -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
610.285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00104825/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiomed.2021.104527 ↗
- Languages:
- English
- ISSNs:
- 0010-4825
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.880000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 17435.xml