Fast hybrid dimensionality reduction method for classification based on feature selection and grouped feature extraction. (15th July 2020)
- Record Type:
- Journal Article
- Title:
- Fast hybrid dimensionality reduction method for classification based on feature selection and grouped feature extraction. (15th July 2020)
- Main Title:
- Fast hybrid dimensionality reduction method for classification based on feature selection and grouped feature extraction
- Authors:
- Li, Mengmeng
Wang, Haofeng
Yang, Lifang
Liang, You
Shang, Zhigang
Wan, Hong - Abstract:
- Highlights: A fast hybrid dimensionality reduction method for classification is proposed. Multi-strategy based feature selection is used to filter out irrelevant features. Grouped feature extraction is used to remove redundancy among features. The proposed method shows excellent efficiency and competitive classification performance. Abstract: Dimensionality reduction is one basic and critical technology for data mining, especially in current "big data" era. As two different types of methods, feature selection and feature extraction each have their pros and cons. In this paper, we combine multi-strategy feature selection and grouped feature extraction and propose a novel fast hybrid dimension reduction method, incorporating their advantages of removing irrelevant and redundant information. Firstly, the intrinsic dimensionality of the data set is estimated by the maximum likelihood estimation method. Fisher Score and Information Gain based feature selection are used as multi-strategy methods to remove irrelevant features. With the redundancy among the selected features as clustering criterion, they are grouped into a certain amount of clusters. In every cluster, Principal Component Analysis (PCA) based feature extraction is carried out to remove redundant information. Four classical classifiers and representation entropy are used to evaluate the classification performance and information loss of the reduced set. The runtime results of different methods show that the proposedHighlights: A fast hybrid dimensionality reduction method for classification is proposed. Multi-strategy based feature selection is used to filter out irrelevant features. Grouped feature extraction is used to remove redundancy among features. The proposed method shows excellent efficiency and competitive classification performance. Abstract: Dimensionality reduction is one basic and critical technology for data mining, especially in current "big data" era. As two different types of methods, feature selection and feature extraction each have their pros and cons. In this paper, we combine multi-strategy feature selection and grouped feature extraction and propose a novel fast hybrid dimension reduction method, incorporating their advantages of removing irrelevant and redundant information. Firstly, the intrinsic dimensionality of the data set is estimated by the maximum likelihood estimation method. Fisher Score and Information Gain based feature selection are used as multi-strategy methods to remove irrelevant features. With the redundancy among the selected features as clustering criterion, they are grouped into a certain amount of clusters. In every cluster, Principal Component Analysis (PCA) based feature extraction is carried out to remove redundant information. Four classical classifiers and representation entropy are used to evaluate the classification performance and information loss of the reduced set. The runtime results of different methods show that the proposed hybrid method is consistently much faster than the other three in almost all of the sets used. Meanwhile, the proposed method shows competitive classification performance, which has no significant difference basically compared with the other methods. The proposed method reduces the dimensionality of the raw data fast and it has excellent efficiency and competitive classification performance compared with the contrastive methods. … (more)
- Is Part Of:
- Expert systems with applications. Volume 150(2020)
- Journal:
- Expert systems with applications
- Issue:
- Volume 150(2020)
- Issue Display:
- Volume 150, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 150
- Issue:
- 2020
- Issue Sort Value:
- 2020-0150-2020-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-07-15
- Subjects:
- Dimensionality Reduction -- Intrinsic Dimensionality -- Feature Selection -- Feature Cluster -- PCA
Expert systems (Computer science) -- Periodicals
Systèmes experts (Informatique) -- Périodiques
Electronic journals
006.33 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09574174 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.eswa.2020.113277 ↗
- Languages:
- English
- ISSNs:
- 0957-4174
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3842.004220
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 13415.xml