Multi-criteria feature selection on cost-sensitive data with missing values. (March 2016)
- Record Type:
- Journal Article
- Title:
- Multi-criteria feature selection on cost-sensitive data with missing values. (March 2016)
- Main Title:
- Multi-criteria feature selection on cost-sensitive data with missing values
- Authors:
- Shu, Wenhao
Shen, Hong - Abstract:
- Abstract: Feature selection plays an important role in pattern recognition and machine learning. Confronted with high dimensional data in many data analysis tasks, feature selection techniques are designed to find a relevant feature subset of the original features which can facilitate classification. However, in many real-world applications, missing feature values that contribute to test and misclassification costs are emerging to be an issue of increasing concern for most data sets, particularly dealing with big data. The existing feature selection approaches do not address this issue effectively. In this paper, based on rough set theory we address the problem of feature selection for cost-sensitive data with missing values. We first propose a multi-criteria evaluation function to characterize the significance of candidate features, by taking into consideration not only the power in the positive region and boundary region but also their associated costs. On this basis, we develop a forward greedy feature selection algorithm for selecting a feature subset of minimized cost that preserves the same information as the whole feature set. In addition, to improve the efficiency of this algorithm, we implement the selection of candidate features in a dwindling object set. Finally, we demonstrate the superior performance of the proposed algorithm to the existing feature selection algorithms through experimental results on different data sets. Abstract : Highlights: A multi-criteriaAbstract: Feature selection plays an important role in pattern recognition and machine learning. Confronted with high dimensional data in many data analysis tasks, feature selection techniques are designed to find a relevant feature subset of the original features which can facilitate classification. However, in many real-world applications, missing feature values that contribute to test and misclassification costs are emerging to be an issue of increasing concern for most data sets, particularly dealing with big data. The existing feature selection approaches do not address this issue effectively. In this paper, based on rough set theory we address the problem of feature selection for cost-sensitive data with missing values. We first propose a multi-criteria evaluation function to characterize the significance of candidate features, by taking into consideration not only the power in the positive region and boundary region but also their associated costs. On this basis, we develop a forward greedy feature selection algorithm for selecting a feature subset of minimized cost that preserves the same information as the whole feature set. In addition, to improve the efficiency of this algorithm, we implement the selection of candidate features in a dwindling object set. Finally, we demonstrate the superior performance of the proposed algorithm to the existing feature selection algorithms through experimental results on different data sets. Abstract : Highlights: A multi-criteria based evaluation function is proposed for measuring features from different viewpoints. A dwindling universe is provided to accelerate the feature selection process. A feature selection algorithm is developed on cost-sensitive data with missing values. The efficiency and effectiveness of the proposed algorithm are demonstrated on different data sets. … (more)
- Is Part Of:
- Pattern recognition. Volume 51(2016:Mar.)
- Journal:
- Pattern recognition
- Issue:
- Volume 51(2016:Mar.)
- Issue Display:
- Volume 51 (2016)
- Year:
- 2016
- Volume:
- 51
- Issue Sort Value:
- 2016-0051-0000-0000
- Page Start:
- 268
- Page End:
- 280
- Publication Date:
- 2016-03
- Subjects:
- Algorithm MCFS Multi-criteria based feature selection algorithm on cost-sensitive data with missing values
Feature selection -- Cost-sensitive data -- Multi-criteria -- Incomplete data -- Rough sets
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2015.09.016 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 59.xml