Using modified term frequency to improve term weighting for text classification. (May 2021)
- Record Type:
- Journal Article
- Title:
- Using modified term frequency to improve term weighting for text classification. (May 2021)
- Main Title:
- Using modified term frequency to improve term weighting for text classification
- Authors:
- Chen, Long
Jiang, Liangxiao
Li, Chaoqun - Abstract:
- Abstract: Text classification (TC) is an essential task of natural language processing (NLP). In order to improve the performance of TC, term weighting is often used to obtain effective text representation by assigning appropriate weights to each term. A term weighting scheme is generally composed of term frequency factor, collection frequency factor and normalization factor. The normalization factor is commonly used as an optional factor to offset the influence of document length. Through the investigation of the existing term weighting schemes, we found that most of them focus on finding a more effective collection frequency factor, but rarely pay attention to finding a new term frequency factor. In this paper, we first proposed a new term frequency factor called modified term frequency (MTF). Different from the normalization factor, MTF directly modifies the raw term frequency based on the length information of all training documents. Then we proposed a new term weighting scheme by combining MTF with an existing collection frequency factor called modified distinguishing feature selector (MDFS). We denoted our scheme by MTF-MDFS (MDFS-based MTF). Extensive experimental results on 19 benchmark text datasets and 6 real-world text datasets show that our proposed MTF and MTF-MDFS are all much better than their state-of-the-art competitors in terms of the classification accuracy and the weighted average of F 1 of widely used base classifiers, such as MNB, SVM and LR.
- Is Part Of:
- Engineering applications of artificial intelligence. Volume 101(2021)
- Journal:
- Engineering applications of artificial intelligence
- Issue:
- Volume 101(2021)
- Issue Display:
- Volume 101, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 101
- Issue:
- 2021
- Issue Sort Value:
- 2021-0101-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-05
- Subjects:
- Text classification -- Term weighting -- Term frequency factor -- Collection frequency factor
Engineering -- Data processing -- Periodicals
Artificial intelligence -- Periodicals
Expert systems (Computer science) -- Periodicals
Ingénierie -- Informatique -- Périodiques
Intelligence artificielle -- Périodiques
Systèmes experts (Informatique) -- Périodiques
Artificial intelligence
Engineering -- Data processing
Expert systems (Computer science)
Periodicals
620.00285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09521976 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.engappai.2021.104215 ↗
- Languages:
- English
- ISSNs:
- 0952-1976
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3755.704500
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 16331.xml