Weighted ReliefF with threshold constraints of feature selection for imbalanced data classification. (18th February 2020)
- Record Type:
- Journal Article
- Title:
- Weighted ReliefF with threshold constraints of feature selection for imbalanced data classification. (18th February 2020)
- Main Title:
- Weighted ReliefF with threshold constraints of feature selection for imbalanced data classification
- Authors:
- Song, Yan
Si, Weiyun
Dai, Feifan
Yang, Guisong - Abstract:
- Summary: Feature selection is a useful method for fulfilling the data classification since the inherent heterogeneity of data and the redundancy of features are often encountered in the current data exploding era. Some commonly used feature selection algorithms, which include but are not limited to Pearson, maximal information coefficient, and ReliefF, are well‐posed under the assumption that instances are distributed homogenously in datasets. However, such an assumption might be not true in the practice. As such, in the presence of data imbalance, these traditional feature selection algorithms might be invalid due to their prejudices to the minority class, which includes few samples. The purpose of the addressed problem in this article is to develop an effective feature selection algorithm for imbalanced judicial datasets, which is capable of extracting essential features while deleting negligible ones according to the practical feature requirements. To achieve this goal, the number and the distribution of samples in each class are fully taken into consideration for the correlation analysis. Compared with the traditional feature selection algorithms, the proposed improved ReliefF algorithm is equipped with: (i) different weights of features according to the characteristics of heterogeneous samples in different classes; (ii) justice for imbalanced datasets; and (iii) threshold constraints resulting from the practical feature requirements. Finally, experiments on a judicialSummary: Feature selection is a useful method for fulfilling the data classification since the inherent heterogeneity of data and the redundancy of features are often encountered in the current data exploding era. Some commonly used feature selection algorithms, which include but are not limited to Pearson, maximal information coefficient, and ReliefF, are well‐posed under the assumption that instances are distributed homogenously in datasets. However, such an assumption might be not true in the practice. As such, in the presence of data imbalance, these traditional feature selection algorithms might be invalid due to their prejudices to the minority class, which includes few samples. The purpose of the addressed problem in this article is to develop an effective feature selection algorithm for imbalanced judicial datasets, which is capable of extracting essential features while deleting negligible ones according to the practical feature requirements. To achieve this goal, the number and the distribution of samples in each class are fully taken into consideration for the correlation analysis. Compared with the traditional feature selection algorithms, the proposed improved ReliefF algorithm is equipped with: (i) different weights of features according to the characteristics of heterogeneous samples in different classes; (ii) justice for imbalanced datasets; and (iii) threshold constraints resulting from the practical feature requirements. Finally, experiments on a judicial dataset and six public datasets well illustrate the effectiveness and the superiority of the proposed feature selection algorithm in improving the classification accuracy for imbalanced datasets. … (more)
- Is Part Of:
- Concurrency and computation. Volume 32:Number 14(2020)
- Journal:
- Concurrency and computation
- Issue:
- Volume 32:Number 14(2020)
- Issue Display:
- Volume 32, Issue 14 (2020)
- Year:
- 2020
- Volume:
- 32
- Issue:
- 14
- Issue Sort Value:
- 2020-0032-0014-0000
- Page Start:
- n/a
- Page End:
- n/a
- Publication Date:
- 2020-02-18
- Subjects:
- data imbalance -- feature selection -- improved ReliefF algorithm -- threshold constraints
Parallel processing (Electronic computers) -- Periodicals
Parallel computers -- Periodicals
004.35 - Journal URLs:
- http://onlinelibrary.wiley.com/ ↗
- DOI:
- 10.1002/cpe.5691 ↗
- Languages:
- English
- ISSNs:
- 1532-0626
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3405.622000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 13322.xml