Turning from TF-IDF to TF-IGM for term weighting in text classification. (30th December 2016)
- Record Type:
- Journal Article
- Title:
- Turning from TF-IDF to TF-IGM for term weighting in text classification. (30th December 2016)
- Main Title:
- Turning from TF-IDF to TF-IGM for term weighting in text classification
- Authors:
- Chen, Kewen
Zhang, Zuping
Long, Jun
Zhang, Hao - Abstract:
- Highlights: A new supervised term weighting scheme called TF-IGM is proposed. It adopts a new statistical model to measure a term's class distinguishing power. It makes full use of the fine-grained term distribution across different classes. It is adaptive to different text datasets by providing options or parameters. It outperforms TF-IDF and state-of-the-art supervised term weighting schemes. Abstract: Massive textual data management and mining usually rely on automatic text classification technology. Term weighting is a basic problem in text classification and directly affects the classification accuracy. Since the traditional TF-IDF (term frequency & inverse document frequency) is not fully effective for text classification, various alternatives have been proposed by researchers. In this paper we make comparative studies on different term weighting schemes and propose a new term weighting scheme, TF-IGM (term frequency & inverse gravity moment), as well as its variants. TF-IGM incorporates a new statistical model to precisely measure the class distinguishing power of a term. Particularly, it makes full use of the fine-grained term distribution across different classes of text. The effectiveness of TF-IGM is validated by extensive experiments of text classification using SVM (support vector machine) and k NN ( k nearest neighbors) classifiers on three commonly used corpora. The experimental results show that TF-IGM outperforms the famous TF-IDF and the state-of-the-artHighlights: A new supervised term weighting scheme called TF-IGM is proposed. It adopts a new statistical model to measure a term's class distinguishing power. It makes full use of the fine-grained term distribution across different classes. It is adaptive to different text datasets by providing options or parameters. It outperforms TF-IDF and state-of-the-art supervised term weighting schemes. Abstract: Massive textual data management and mining usually rely on automatic text classification technology. Term weighting is a basic problem in text classification and directly affects the classification accuracy. Since the traditional TF-IDF (term frequency & inverse document frequency) is not fully effective for text classification, various alternatives have been proposed by researchers. In this paper we make comparative studies on different term weighting schemes and propose a new term weighting scheme, TF-IGM (term frequency & inverse gravity moment), as well as its variants. TF-IGM incorporates a new statistical model to precisely measure the class distinguishing power of a term. Particularly, it makes full use of the fine-grained term distribution across different classes of text. The effectiveness of TF-IGM is validated by extensive experiments of text classification using SVM (support vector machine) and k NN ( k nearest neighbors) classifiers on three commonly used corpora. The experimental results show that TF-IGM outperforms the famous TF-IDF and the state-of-the-art supervised term weighting schemes. In addition, some new findings different from previous studies are obtained and analyzed in depth in the paper. … (more)
- Is Part Of:
- Expert systems with applications. Volume 66(2016)
- Journal:
- Expert systems with applications
- Issue:
- Volume 66(2016)
- Issue Display:
- Volume 66, Issue 2016 (2016)
- Year:
- 2016
- Volume:
- 66
- Issue:
- 2016
- Issue Sort Value:
- 2016-0066-2016-0000
- Page Start:
- 245
- Page End:
- 260
- Publication Date:
- 2016-12-30
- Subjects:
- Term weighting -- Text classification -- Inverse gravity moment (IGM) -- Class distinguishing power -- Classifier
Expert systems (Computer science) -- Periodicals
Systèmes experts (Informatique) -- Périodiques
Electronic journals
006.33 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09574174 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.eswa.2016.09.009 ↗
- Languages:
- English
- ISSNs:
- 0957-4174
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3842.004220
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 5.xml