Feature selection methods for document clustering: a comparative study and a hybrid solution. (9th July 2019)
- Record Type:
- Journal Article
- Title:
- Feature selection methods for document clustering: a comparative study and a hybrid solution. (9th July 2019)
- Main Title:
- Feature selection methods for document clustering: a comparative study and a hybrid solution
- Authors:
- Benghabrit, Asmaa
Ouhbi, Brahim
Frikh, Bouchra
Zemmouri, El Moukhtar
Behja, Hicham - Abstract:
- The web proliferation makes the exploration and the use of the huge amount of available unstructured text documents challenged, which drives the need of document clustering. Hence, improving the performances of this mechanism by using feature selection seems worth investigation. Therefore, this paper proposes an efficient way to highly benefit from feature selection for document clustering. We first present a review and comparative studies of feature selection methods in order to extract efficient ones. Then we propose a sequential and hybrid combination modes of statistical and semantic techniques in order to benefit from crucial information that each of them provides for document clustering. Extensive experiments prove the benefit of the proposed combination approaches. The performance of document clustering is highest when the measures based on Chi-square statistic and the mutual information are linearly combined. Doing so, it avoids the unwanted correlation that the sequential approach creates between the two treatments.
- Is Part Of:
- International journal of data analysis techniques and strategies. Volume 11:Number 3(2019)
- Journal:
- International journal of data analysis techniques and strategies
- Issue:
- Volume 11:Number 3(2019)
- Issue Display:
- Volume 11, Issue 3 (2019)
- Year:
- 2019
- Volume:
- 11
- Issue:
- 3
- Issue Sort Value:
- 2019-0011-0003-0000
- Page Start:
- 246
- Page End:
- 272
- Publication Date:
- 2019-07-09
- Subjects:
- document clustering -- feature selection -- statistical and semantic data analysis -- chi-square statistic -- mutual information -- k-means algorithm -- comparative study -- hybrid solution
Electronic data processing -- Periodicals
Database searching -- Periodicals
005 - Journal URLs:
- http://www.inderscience.com/jhome.php?jcode=ijdats ↗
http://www.inderscience.com/ ↗ - Languages:
- English
- ISSNs:
- 1755-8050
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 10848.xml