Concept decompositions for short text clustering by identifying word communities. (April 2018)
- Record Type:
- Journal Article
- Title:
- Concept decompositions for short text clustering by identifying word communities. (April 2018)
- Main Title:
- Concept decompositions for short text clustering by identifying word communities
- Authors:
- Jia, Caiyan
Carson, Matthew B.
Wang, Xiaoyang
Yu, Jian - Abstract:
- Highlights: A new concept decomposition method WordCom is proposed. It creates concept vectors by identifying semantic word communities from a weighted word co-occurrence network. It is not only robust to the sparsity of short texts but also overcomes the curse of dimensionality. It scaling to a large number of short text inputs due to the concept vectors being obtained from term-term space. Experimental tests have shown that the proposed method outperforms state-of-the-art algorithms. Abstract: Short text clustering is an increasingly important methodology but faces the challenges of sparsity and high-dimensionality of text data. Previous concept decomposition methods have obtained concept vectors via the centroids of clusters using k -means-type clustering algorithms on normal, full texts. In this study, we propose a new concept decomposition method that creates concept vectors by identifying semantic word communities from a weighted word co-occurrence network extracted from a short text corpus or a subset thereof. The cluster memberships of short texts are then estimated by mapping the original short texts to the learned semantic concept vectors. The proposed method is not only robust to the sparsity of short text corpora but also overcomes the curse of dimensionality, scaling to a large number of short text inputs due to the concept vectors being obtained from term-term instead of document-term space. Experimental tests have shown that the proposed method outperformsHighlights: A new concept decomposition method WordCom is proposed. It creates concept vectors by identifying semantic word communities from a weighted word co-occurrence network. It is not only robust to the sparsity of short texts but also overcomes the curse of dimensionality. It scaling to a large number of short text inputs due to the concept vectors being obtained from term-term space. Experimental tests have shown that the proposed method outperforms state-of-the-art algorithms. Abstract: Short text clustering is an increasingly important methodology but faces the challenges of sparsity and high-dimensionality of text data. Previous concept decomposition methods have obtained concept vectors via the centroids of clusters using k -means-type clustering algorithms on normal, full texts. In this study, we propose a new concept decomposition method that creates concept vectors by identifying semantic word communities from a weighted word co-occurrence network extracted from a short text corpus or a subset thereof. The cluster memberships of short texts are then estimated by mapping the original short texts to the learned semantic concept vectors. The proposed method is not only robust to the sparsity of short text corpora but also overcomes the curse of dimensionality, scaling to a large number of short text inputs due to the concept vectors being obtained from term-term instead of document-term space. Experimental tests have shown that the proposed method outperforms state-of-the-art algorithms. … (more)
- Is Part Of:
- Pattern recognition. Volume 76(2018:Apr.)
- Journal:
- Pattern recognition
- Issue:
- Volume 76(2018:Apr.)
- Issue Display:
- Volume 76 (2018)
- Year:
- 2018
- Volume:
- 76
- Issue Sort Value:
- 2018-0076-0000-0000
- Page Start:
- 691
- Page End:
- 703
- Publication Date:
- 2018-04
- Subjects:
- Short text clustering -- Concept decomposition -- Spherical k-means -- Semantic word community -- Community detection
00-01 -- 99-00
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2017.09.045 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 11338.xml