Stable feature selection using copula based mutual information. (April 2021)
- Record Type:
- Journal Article
- Title:
- Stable feature selection using copula based mutual information. (April 2021)
- Main Title:
- Stable feature selection using copula based mutual information
- Authors:
- Lall, Snehalika
Sinha, Debajyoti
Ghosh, Abhik
Sengupta, Debarka
Bandyopadhyay, Sanghamitra - Abstract:
- Highlights: A novel method for feature selection using incorporation of copula based multivariate dependency in mutual information, which assists to remove the need to average out over multiple instances of bivariate dependencies. The method is unbiased against noisy datasets due to the scale-invariant property of copula. The method can be applied to datasets, in which the ratio between sample size to class size is large enough, even though original marginal distributions are unknown. The method also satisfies the maximum relevance and minimum redundancy criteria of feature selection. Abstract: Feature selection is a key step in many machine learning tasks. A majority of the existing methods of feature selection address the problem by devising some scoring function while treating the features independently, thereby overlooking their interdependencies. We leverage the scale invariance property of copula to construct a greedy, supervised feature selection algorithm that maximizes the feature relevance while minimizing the redundant information content. Multivariate copula is used in the proposed copula Based Feature Selection (CBFS) to discover the dependence structure between features. The incorporation of copula-based multivariate dependency in the formulation of mutual information helps avoid averaging over multiple instances of bivariate dependencies, thus eliminating the average estimation error introduced when bivariate dependency is used between a pair of featureHighlights: A novel method for feature selection using incorporation of copula based multivariate dependency in mutual information, which assists to remove the need to average out over multiple instances of bivariate dependencies. The method is unbiased against noisy datasets due to the scale-invariant property of copula. The method can be applied to datasets, in which the ratio between sample size to class size is large enough, even though original marginal distributions are unknown. The method also satisfies the maximum relevance and minimum redundancy criteria of feature selection. Abstract: Feature selection is a key step in many machine learning tasks. A majority of the existing methods of feature selection address the problem by devising some scoring function while treating the features independently, thereby overlooking their interdependencies. We leverage the scale invariance property of copula to construct a greedy, supervised feature selection algorithm that maximizes the feature relevance while minimizing the redundant information content. Multivariate copula is used in the proposed copula Based Feature Selection (CBFS) to discover the dependence structure between features. The incorporation of copula-based multivariate dependency in the formulation of mutual information helps avoid averaging over multiple instances of bivariate dependencies, thus eliminating the average estimation error introduced when bivariate dependency is used between a pair of feature variables. Under a controlled setting, our algorithm outperformed the existing best practice methods in warding off the noise in data. On several real and synthetic datasets, the proposed algorithm performed competitively in maximizing classification accuracy. CBFS also outperforms the other methods in terms of its noise tolerance property. … (more)
- Is Part Of:
- Pattern recognition. Volume 112(2021)
- Journal:
- Pattern recognition
- Issue:
- Volume 112(2021)
- Issue Display:
- Volume 112, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 112
- Issue:
- 2021
- Issue Sort Value:
- 2021-0112-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-04
- Subjects:
- Copula -- Feature selection -- Mutual information -- Stability -- Classification accuracy
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2020.107697 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 15745.xml