A filter feature selection method based on the Maximal Information Coefficient and Gram-Schmidt Orthogonalization for biomedical data mining. (1st October 2017)
- Record Type:
- Journal Article
- Title:
- A filter feature selection method based on the Maximal Information Coefficient and Gram-Schmidt Orthogonalization for biomedical data mining. (1st October 2017)
- Main Title:
- A filter feature selection method based on the Maximal Information Coefficient and Gram-Schmidt Orthogonalization for biomedical data mining
- Authors:
- Lyu, Hongqiang
Wan, Mingxi
Han, Jiuqiang
Liu, Ruiling
Wang, Cheng - Abstract:
- Abstract: A filter feature selection technique has been widely used to mine biomedical data. Recently, in the classical filter method minimal-Redundancy-Maximal-Relevance (mRMR), a risk has been revealed that a specific part of the redundancy, called irrelevant redundancy, may be involved in the minimal-redundancy component of this method. Thus, a few attempts to eliminate the irrelevant redundancy by attaching additional procedures to mRMR, such as Kernel Canonical Correlation Analysis based mRMR (KCCAmRMR), have been made. In the present study, a novel filter feature selection method based on the Maximal Information Coefficient (MIC) and Gram-Schmidt Orthogonalization (GSO), named Orthogonal MIC Feature Selection (OMICFS), was proposed to solve this problem. Different from other improved approaches under the max-relevance and min-redundancy criterion, in the proposed method, the MIC is used to quantify the degree of relevance between feature variables and target variable, the GSO is devoted to calculating the orthogonalized variable of a candidate feature with respect to previously selected features, and the max-relevance and min-redundancy can be indirectly optimized by maximizing the MIC relevance between the GSO orthogonalized variable and target. This orthogonalization strategy allows OMICFS to exclude the irrelevant redundancy without any additional procedures. To verify the performance, OMICFS was compared with other filter feature selection methods in terms of bothAbstract: A filter feature selection technique has been widely used to mine biomedical data. Recently, in the classical filter method minimal-Redundancy-Maximal-Relevance (mRMR), a risk has been revealed that a specific part of the redundancy, called irrelevant redundancy, may be involved in the minimal-redundancy component of this method. Thus, a few attempts to eliminate the irrelevant redundancy by attaching additional procedures to mRMR, such as Kernel Canonical Correlation Analysis based mRMR (KCCAmRMR), have been made. In the present study, a novel filter feature selection method based on the Maximal Information Coefficient (MIC) and Gram-Schmidt Orthogonalization (GSO), named Orthogonal MIC Feature Selection (OMICFS), was proposed to solve this problem. Different from other improved approaches under the max-relevance and min-redundancy criterion, in the proposed method, the MIC is used to quantify the degree of relevance between feature variables and target variable, the GSO is devoted to calculating the orthogonalized variable of a candidate feature with respect to previously selected features, and the max-relevance and min-redundancy can be indirectly optimized by maximizing the MIC relevance between the GSO orthogonalized variable and target. This orthogonalization strategy allows OMICFS to exclude the irrelevant redundancy without any additional procedures. To verify the performance, OMICFS was compared with other filter feature selection methods in terms of both classification accuracy and computational efficiency by conducting classification experiments on two types of biomedical datasets. The results showed that OMICFS outperforms the other methods in most cases. In addition, differences between these methods were analyzed, and the application of OMICFS in the mining of high-dimensional biomedical data was discussed. The Matlab code for the proposed method is available athttps://github.com/lhqxinghun/bioinformatics/tree/master/OMICFS/ . Highlights: A novel filter feature selection method named OMICFS is proposed. MIC statistics is employed to quantify the relevance between features and target. An orthogonalization strategy is used to deal with the irrelevant redundancy risk. The performance is compared in terms of both accuracy and efficiency. … (more)
- Is Part Of:
- Computers in biology and medicine. Volume 89(2017)
- Journal:
- Computers in biology and medicine
- Issue:
- Volume 89(2017)
- Issue Display:
- Volume 89, Issue 2017 (2017)
- Year:
- 2017
- Volume:
- 89
- Issue:
- 2017
- Issue Sort Value:
- 2017-0089-2017-0000
- Page Start:
- 264
- Page End:
- 274
- Publication Date:
- 2017-10-01
- Subjects:
- Filter feature selection -- Maximal Information Coefficient (MIC) -- Gram-Schmidt Orthogonalization (GSO) -- Biomedical data mining
Medicine -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
610.285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00104825/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiomed.2017.08.021 ↗
- Languages:
- English
- ISSNs:
- 0010-4825
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.880000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 4956.xml