Incorporating Biterm Correlation Knowledge into Topic Modeling for Short Texts. (8th July 2020)
- Record Type:
- Journal Article
- Title:
- Incorporating Biterm Correlation Knowledge into Topic Modeling for Short Texts. (8th July 2020)
- Main Title:
- Incorporating Biterm Correlation Knowledge into Topic Modeling for Short Texts
- Authors:
- Zhang, Kai
Zhou, Yuan
Chen, Zheng
Liu, Yufei
Tang, Zhuo
Yin, Li
Chen, Jihong - Abstract:
- Abstract: The prevalence of short texts on the Web has made mining the latent topic structures of short texts a critical and fundamental task for many applications. However, due to the lack of word co-occurrence information induced by the content sparsity of short texts, it is challenging for traditional topic models like latent Dirichlet allocation (LDA) to extract coherent topic structures on short texts. Incorporating external semantic knowledge into the topic modeling process is an effective strategy to improve the coherence of inferred topics. In this paper, we develop a novel topic model—called biterm correlation knowledge-based topic model (BCK-TM)—to infer latent topics from short texts. Specifically, the proposed model mines biterm correlation knowledge automatically based on recent progress in word embedding, which can represent semantic information of words in a continuous vector space. To incorporate external knowledge, a knowledge incorporation mechanism is designed over the latent topic layer to regularize the topic assignment of each biterm during the topic sampling process. Experimental results on three public benchmark datasets illustrate the superior performance of the proposed approach over several state-of-the-art baseline models.
- Is Part Of:
- Computer journal. Volume 65:Number 3(2022)
- Journal:
- Computer journal
- Issue:
- Volume 65:Number 3(2022)
- Issue Display:
- Volume 65, Issue 3 (2022)
- Year:
- 2022
- Volume:
- 65
- Issue:
- 3
- Issue Sort Value:
- 2022-0065-0003-0000
- Page Start:
- 537
- Page End:
- 553
- Publication Date:
- 2020-07-08
- Subjects:
- topic model -- short texts -- biterm correlation knowledge -- word embedding -- Gibbs sampling
Computers -- Periodicals
005.1 - Journal URLs:
- http://comjnl.oxfordjournals.org/ ↗
http://ukcatalogue.oup.com/ ↗ - DOI:
- 10.1093/comjnl/bxaa079 ↗
- Languages:
- English
- ISSNs:
- 0010-4620
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.060000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 21558.xml