A novel multivariate filter method for feature selection in text classification problems. (April 2018)
- Record Type:
- Journal Article
- Title:
- A novel multivariate filter method for feature selection in text classification problems. (April 2018)
- Main Title:
- A novel multivariate filter method for feature selection in text classification problems
- Authors:
- Labani, Mahdieh
Moradi, Parham
Ahmadizar, Fardin
Jalili, Mahdi - Abstract:
- Abstract: With increasing number of documents in digital format, automatic text categorization has become a crucial task in pattern recognition problems. To ease the classification task, feature selection methods have been introduced to reduce the dimensionality of the feature space, and thus improve the classification performance. In this paper a novel filter method for feature selection, called Multivariate Relative Discrimination Criterion (MRDC), is proposed for text classification. The proposed method focuses on the reduction of redundant features using minimal-redundancy and maximal-relevancy concepts. To this end, the proposed method takes into account document frequencies for each term, while estimating their usefulness. The proposed method not only selects the features with maximum relevancy, but also the redundancy between them is takes into account using a correlation metric. MRDC does not employ any learning algorithm to evaluate the usefulness of the selected features, and thus it can be categorized as a filter method. In order to assess the effectiveness of the proposed method, several experiments are performed on three real-world datasets. The obtained results are compared to the state-of-the-art filter methods. The reported results show that in most cases MRDC results in better classification performance than others. Highlights: We proposed a feature selection method called MRDC for text classification tasks. MRDC is classified as a filter, multivariate andAbstract: With increasing number of documents in digital format, automatic text categorization has become a crucial task in pattern recognition problems. To ease the classification task, feature selection methods have been introduced to reduce the dimensionality of the feature space, and thus improve the classification performance. In this paper a novel filter method for feature selection, called Multivariate Relative Discrimination Criterion (MRDC), is proposed for text classification. The proposed method focuses on the reduction of redundant features using minimal-redundancy and maximal-relevancy concepts. To this end, the proposed method takes into account document frequencies for each term, while estimating their usefulness. The proposed method not only selects the features with maximum relevancy, but also the redundancy between them is takes into account using a correlation metric. MRDC does not employ any learning algorithm to evaluate the usefulness of the selected features, and thus it can be categorized as a filter method. In order to assess the effectiveness of the proposed method, several experiments are performed on three real-world datasets. The obtained results are compared to the state-of-the-art filter methods. The reported results show that in most cases MRDC results in better classification performance than others. Highlights: We proposed a feature selection method called MRDC for text classification tasks. MRDC is classified as a filter, multivariate and supervised feature selection method. MRDC selects maximum relevant and minimum redundant features. The method has been compared to well-known univariate and multivariate methods. The results show that MRDC outperforms the other univariate and multivariate methods. … (more)
- Is Part Of:
- Engineering applications of artificial intelligence. Volume 70(2017:Oct.)
- Journal:
- Engineering applications of artificial intelligence
- Issue:
- Volume 70(2017:Oct.)
- Issue Display:
- Volume 70 (2017)
- Year:
- 2017
- Volume:
- 70
- Issue Sort Value:
- 2017-0070-0000-0000
- Page Start:
- 25
- Page End:
- 37
- Publication Date:
- 2018-04
- Subjects:
- Text classification -- Feature selection -- Dimensionality reduction -- Filter approach -- Multivariate analysis
Engineering -- Data processing -- Periodicals
Artificial intelligence -- Periodicals
Expert systems (Computer science) -- Periodicals
Ingénierie -- Informatique -- Périodiques
Intelligence artificielle -- Périodiques
Systèmes experts (Informatique) -- Périodiques
Artificial intelligence
Engineering -- Data processing
Expert systems (Computer science)
Periodicals
620.00285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09521976 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.engappai.2017.12.014 ↗
- Languages:
- English
- ISSNs:
- 0952-1976
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3755.704500
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 5947.xml