Learning bag-of-embedded-words representations for textual information retrieval. (September 2018)
- Record Type:
- Journal Article
- Title:
- Learning bag-of-embedded-words representations for textual information retrieval. (September 2018)
- Main Title:
- Learning bag-of-embedded-words representations for textual information retrieval
- Authors:
- Passalis, Nikolaos
Tefas, Anastasios - Abstract:
- Highlights: A novel BoF-based model is proposed for efficiently representing text documents. A weighting mask (similar to the traditional BoW weighting schemes) is learned. The BoEW is optimized end-to-end (from the word embeddings to the weighting mask). The learned representation can be efficiently finetuned using relevance feedback. The proposed method is evaluated using three text collections from different domains. Abstract: Word embedding models are able to accurately model the semantic content of words. The process of extracting a set of word embedding vectors from a text document is similar to the feature extraction step of the Bag-of-Features (BoF) model, which is usually used in computer vision tasks. This gives rise to the proposed Bag-of-Embedded Words (BoEW) model that can efficiently represent text documents overcoming the limitations of previously predominantly used techniques, such as the textual Bag-of-Words model. The proposed method extends the regular BoF model by a) incorporating a weighting mask that allows for altering the importance of each learned codeword and b) by optimizing the model end-to-end (from the word embeddings to the weighting mask). Furthermore, the BoEW model also provides a fast way to fine-tune the learned representation towards the information need of the user using relevance feedback techniques. Finally, a novel spherical entropy objective function is proposed to optimize the learned representation for retrieval using the cosineHighlights: A novel BoF-based model is proposed for efficiently representing text documents. A weighting mask (similar to the traditional BoW weighting schemes) is learned. The BoEW is optimized end-to-end (from the word embeddings to the weighting mask). The learned representation can be efficiently finetuned using relevance feedback. The proposed method is evaluated using three text collections from different domains. Abstract: Word embedding models are able to accurately model the semantic content of words. The process of extracting a set of word embedding vectors from a text document is similar to the feature extraction step of the Bag-of-Features (BoF) model, which is usually used in computer vision tasks. This gives rise to the proposed Bag-of-Embedded Words (BoEW) model that can efficiently represent text documents overcoming the limitations of previously predominantly used techniques, such as the textual Bag-of-Words model. The proposed method extends the regular BoF model by a) incorporating a weighting mask that allows for altering the importance of each learned codeword and b) by optimizing the model end-to-end (from the word embeddings to the weighting mask). Furthermore, the BoEW model also provides a fast way to fine-tune the learned representation towards the information need of the user using relevance feedback techniques. Finally, a novel spherical entropy objective function is proposed to optimize the learned representation for retrieval using the cosine similarity metric. … (more)
- Is Part Of:
- Pattern recognition. Volume 81(2018:Sep.)
- Journal:
- Pattern recognition
- Issue:
- Volume 81(2018:Sep.)
- Issue Display:
- Volume 81 (2018)
- Year:
- 2018
- Volume:
- 81
- Issue Sort Value:
- 2018-0081-0000-0000
- Page Start:
- 254
- Page End:
- 267
- Publication Date:
- 2018-09
- Subjects:
- Word embeddings -- Bag-of-words -- Bag-of-features -- Dictionary learning -- Relevance feedback -- Information retrieval
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2018.04.008 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 12876.xml