A Fast and Accurate Feature Selection Algorithm Based on Binary Consistency Measure. (24th September 2015)
- Record Type:
- Journal Article
- Title:
- A Fast and Accurate Feature Selection Algorithm Based on Binary Consistency Measure. (24th September 2015)
- Main Title:
- A Fast and Accurate Feature Selection Algorithm Based on Binary Consistency Measure
- Authors:
- Shin, Kilho
Miyazaki, Seiya - Abstract:
- Abstract : Consistency‐based feature selection is an important category of feature selection research, and its advantage over other categories is due to consistency measures used to include the effect of interaction among features into evaluation of relevance of features. Even if features individually appear irrelevant to class labels, they can collectively show strong relevance. In such cases, we say that the features interact with each other. Consistency measures, in this regard, evaluate the collective relevance of a set of features and has been intuitively understood as a metric to measure a distance of an arbitrary feature set from the state of being consistent : A set of features is said to be consistent if, and only if, they as a whole determine class labels. In history, the binary consistency measure, which returns the value 1 if the feature set is consistent and 0 otherwise, was the first consistency measure introduced, and many advanced measures followed. The problem of the binary measure consists in the fact that it always returns 1 if a data set includes no consistent feature set. The measures that followed have solved this problem but sacrificed time efficiency of evaluation. Therefore, feature selection leveraging these measures are not fast enough to apply to large data sets. In this article, we aim to improve time efficiency of consistency‐based feature selection. To achieve the goal, we propose a new idea, which we call data set denoising : We eliminateAbstract : Consistency‐based feature selection is an important category of feature selection research, and its advantage over other categories is due to consistency measures used to include the effect of interaction among features into evaluation of relevance of features. Even if features individually appear irrelevant to class labels, they can collectively show strong relevance. In such cases, we say that the features interact with each other. Consistency measures, in this regard, evaluate the collective relevance of a set of features and has been intuitively understood as a metric to measure a distance of an arbitrary feature set from the state of being consistent : A set of features is said to be consistent if, and only if, they as a whole determine class labels. In history, the binary consistency measure, which returns the value 1 if the feature set is consistent and 0 otherwise, was the first consistency measure introduced, and many advanced measures followed. The problem of the binary measure consists in the fact that it always returns 1 if a data set includes no consistent feature set. The measures that followed have solved this problem but sacrificed time efficiency of evaluation. Therefore, feature selection leveraging these measures are not fast enough to apply to large data sets. In this article, we aim to improve time efficiency of consistency‐based feature selection. To achieve the goal, we propose a new idea, which we call data set denoising : We eliminate examples which are viewed as noises from a data set until the data set becomes to include consistent feature sets and then apply the binary measure to find an appropriate feature set that is consistent. In our evaluation through intensive experiments, CWC, a new algorithm that implements data set denoising outperformed in both time efficiency and accuracy the benchmark consistency‐based algorithms. Specifically, CWC was about 31 times faster than theLCC that had been known as the fastest in the literature. Furthermore, in a comparison including feature selection algorithms that are not consistency‐based, CWC has turned out to be one of the fastest and the most accurate feature selection algorithms. … (more)
- Is Part Of:
- Computational intelligence. Volume 32:Number 4(2016)
- Journal:
- Computational intelligence
- Issue:
- Volume 32:Number 4(2016)
- Issue Display:
- Volume 32, Issue 4 (2016)
- Year:
- 2016
- Volume:
- 32
- Issue:
- 4
- Issue Sort Value:
- 2016-0032-0004-0000
- Page Start:
- 646
- Page End:
- 667
- Publication Date:
- 2015-09-24
- Subjects:
- feature selection, filter approach, consistency
Artificial intelligence -- Periodicals
Computational linguistics -- Periodicals
006.3 - Journal URLs:
- http://www.blackwellpublishing.com/journal.asp?ref=0824-7935&site=1 ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1111/coin.12072 ↗
- Languages:
- English
- ISSNs:
- 0824-7935
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3390.595000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 1503.xml