A topic‐based term frequency normalization framework to enhance probabilistic information retrieval. (20th November 2019)
- Record Type:
- Journal Article
- Title:
- A topic‐based term frequency normalization framework to enhance probabilistic information retrieval. (20th November 2019)
- Main Title:
- A topic‐based term frequency normalization framework to enhance probabilistic information retrieval
- Authors:
- Jian, Fanghong
Huang, Jimmy X.
Zhao, Jiashu
Ying, Zhiwei
Wang, Yuqi - Abstract:
- Abstract: Many well‐known probabilistic information retrieval models have shown promise for use in document ranking, especially BM25. Nevertheless, it is observed that the control parameters in BM25 usually need to be adjusted to achieve improved performance on different data sets; additionally, the assumption in BM25 on the bag‐of‐words model prevents its direct utilization of rich information that lies at the sentence or document level. Inspired by the above challenges with respect to BM25, we first propose a new normalization method on the term frequency in BM25 (called BM25QL in this paper); in addition, the method is incorporated into CRTER2, a recent BM25‐based model, to construct CRTER2 QL . Then, we incorporate topic modeling and word embedding into BM25 to relax the assumption of the bag‐of‐words model. In this direction, we propose a topic‐based retrieval model, TopTF, for BM25, which is then further incorporated into the language model (LM) and the multiple aspect term frequency (MATF) model. Furthermore, an enhanced topic‐based term frequency normalization framework, ETopTF, based on embedding is presented. Experimental studies demonstrate the great effectiveness and performance of these methods. Specifically, on all tested data sets and in terms of the mean average precision (MAP), our proposed models, BM25QL and CRTER2 QL, are comparable to BM25 and CRTER2 with the best b parameter value; the TopTF models significantly outperform the baselines, and the ETopTFAbstract: Many well‐known probabilistic information retrieval models have shown promise for use in document ranking, especially BM25. Nevertheless, it is observed that the control parameters in BM25 usually need to be adjusted to achieve improved performance on different data sets; additionally, the assumption in BM25 on the bag‐of‐words model prevents its direct utilization of rich information that lies at the sentence or document level. Inspired by the above challenges with respect to BM25, we first propose a new normalization method on the term frequency in BM25 (called BM25QL in this paper); in addition, the method is incorporated into CRTER2, a recent BM25‐based model, to construct CRTER2 QL . Then, we incorporate topic modeling and word embedding into BM25 to relax the assumption of the bag‐of‐words model. In this direction, we propose a topic‐based retrieval model, TopTF, for BM25, which is then further incorporated into the language model (LM) and the multiple aspect term frequency (MATF) model. Furthermore, an enhanced topic‐based term frequency normalization framework, ETopTF, based on embedding is presented. Experimental studies demonstrate the great effectiveness and performance of these methods. Specifically, on all tested data sets and in terms of the mean average precision (MAP), our proposed models, BM25QL and CRTER2 QL, are comparable to BM25 and CRTER2 with the best b parameter value; the TopTF models significantly outperform the baselines, and the ETopTF models could further improve the TopTF in terms of the MAP. … (more)
- Is Part Of:
- Computational intelligence. Volume 36:Number 2(2020)
- Journal:
- Computational intelligence
- Issue:
- Volume 36:Number 2(2020)
- Issue Display:
- Volume 36, Issue 2 (2020)
- Year:
- 2020
- Volume:
- 36
- Issue:
- 2
- Issue Sort Value:
- 2020-0036-0002-0000
- Page Start:
- 486
- Page End:
- 521
- Publication Date:
- 2019-11-20
- Subjects:
- Dirichlet language model -- embedding -- LDA -- probabilistic model -- term frequency normalization -- topic modeling
Artificial intelligence -- Periodicals
Computational linguistics -- Periodicals
006.3 - Journal URLs:
- http://www.blackwellpublishing.com/journal.asp?ref=0824-7935&site=1 ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1111/coin.12248 ↗
- Languages:
- English
- ISSNs:
- 0824-7935
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3390.595000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 13153.xml