Hierarchy construction and text classification based on the relaxation strategy and least information model. (15th June 2018)
- Record Type:
- Journal Article
- Title:
- Hierarchy construction and text classification based on the relaxation strategy and least information model. (15th June 2018)
- Main Title:
- Hierarchy construction and text classification based on the relaxation strategy and least information model
- Authors:
- Du, Yongping
Liu, Jingxuan
Ke, Weimao
Gong, Xuemei - Abstract:
- Highlights: Hierarchical classification is an effective approach to categorize large-scale text data. The relaxation strategy effectively alleviates the impact of the 'blocking' problem. A new term weighting approach based on the Least Information Theory is proposed. It offers a new information quantify model by different probability distributions. Abstract: Hierarchical classification is an effective approach to categorization of large-scale text data. We introduce a relaxed strategy into the traditional hierarchical classification method to improve the system performance. During the process of hierarchy structure construction, our method delays node judgment of the uncertain category until it can be classified clearly. This approach effectively alleviates the 'block' problem which transfers the classification error from the higher level to the lower level in the hierarchy structure. A new term weighting approach based on the Least Information Theory (LIT) is adopted for the hierarchy classification. It quantifies information in probability distribution changes and offers a new document representation model where the contribution of each term can be properly weighted. The experimental results show that the relaxation approach builds a more reasonable hierarchy and further improves classification performance. It also outperforms other classification methods such as SVM (Support Vector Machine) in terms of efficiency and the approach is more efficient for large-scale textHighlights: Hierarchical classification is an effective approach to categorize large-scale text data. The relaxation strategy effectively alleviates the impact of the 'blocking' problem. A new term weighting approach based on the Least Information Theory is proposed. It offers a new information quantify model by different probability distributions. Abstract: Hierarchical classification is an effective approach to categorization of large-scale text data. We introduce a relaxed strategy into the traditional hierarchical classification method to improve the system performance. During the process of hierarchy structure construction, our method delays node judgment of the uncertain category until it can be classified clearly. This approach effectively alleviates the 'block' problem which transfers the classification error from the higher level to the lower level in the hierarchy structure. A new term weighting approach based on the Least Information Theory (LIT) is adopted for the hierarchy classification. It quantifies information in probability distribution changes and offers a new document representation model where the contribution of each term can be properly weighted. The experimental results show that the relaxation approach builds a more reasonable hierarchy and further improves classification performance. It also outperforms other classification methods such as SVM (Support Vector Machine) in terms of efficiency and the approach is more efficient for large-scale text classification tasks. Compared to the classic term weighting method TF*IDF, LIT-based methods achieves significant improvement on the classification performance. … (more)
- Is Part Of:
- Expert systems with applications. Volume 100(2018)
- Journal:
- Expert systems with applications
- Issue:
- Volume 100(2018)
- Issue Display:
- Volume 100, Issue 2018 (2018)
- Year:
- 2018
- Volume:
- 100
- Issue:
- 2018
- Issue Sort Value:
- 2018-0100-2018-0000
- Page Start:
- 157
- Page End:
- 164
- Publication Date:
- 2018-06-15
- Subjects:
- Hierarchy classification -- Relaxation strategy -- Least Information Theory -- Term weighting
Expert systems (Computer science) -- Periodicals
Systèmes experts (Informatique) -- Périodiques
Electronic journals
006.33 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09574174 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.eswa.2018.02.003 ↗
- Languages:
- English
- ISSNs:
- 0957-4174
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3842.004220
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 5859.xml