A study of data pre-processing techniques for imbalanced biomedical data classification. (14th August 2020)
- Record Type:
- Journal Article
- Title:
- A study of data pre-processing techniques for imbalanced biomedical data classification. (14th August 2020)
- Main Title:
- A study of data pre-processing techniques for imbalanced biomedical data classification
- Authors:
- Liu, Shigang
Zhang, Jun
Xiang, Yang
Zhou, Wanlei
Xiang, Dongxi - Abstract:
- Biomedical data are widely accepted in developing prediction models for identifying a specific tumour, drug discovery and human cancers detection. However, previous studies usually focused on different classifiers, and overlook the class imbalance problem in real-world biomedical datasets. This paper mainly focuses on reviewing and evaluating some popular and recently developed resampling and feature selection (FS) methods for class imbalance learning with data distribution being considered. Experimental results show that: 1) resampling and FS techniques exhibit better performance using support vector machine (SVM) classifier; 2) techniques such as random undersampling and FS perform better than other data pre-processing methods with T location-scale distribution when using SVM and K-nearest neighbours (KNN) classifiers. Random oversampling outperforms other methods on negative binomial distribution using Random Forest with lower level of imbalance ratio; 3) FS outperforms other data pre-processing methods in most cases, thus, FS with SVM classifier is the best choice for imbalanced biomedical data learning.
- Is Part Of:
- International journal of bioinformatics research and applications. Volume 16:Number 3(2020)
- Journal:
- International journal of bioinformatics research and applications
- Issue:
- Volume 16:Number 3(2020)
- Issue Display:
- Volume 16, Issue 3 (2020)
- Year:
- 2020
- Volume:
- 16
- Issue:
- 3
- Issue Sort Value:
- 2020-0016-0003-0000
- Page Start:
- 290
- Page End:
- 318
- Publication Date:
- 2020-08-14
- Subjects:
- class-imbalance -- data distribution -- classification -- biomedical data -- resampling -- feature selection
Bioinformatics -- Periodicals
570.285 - Journal URLs:
- http://www.inderscience.com/browse/index.php?journalID=155 ↗
http://www.inderscience.com/ ↗ - Languages:
- English
- ISSNs:
- 1744-5485
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 13586.xml