Assessing the behavior and performance of a supervised term-weighting technique for topic-based retrieval. Issue 3 (May 2021)
- Record Type:
- Journal Article
- Title:
- Assessing the behavior and performance of a supervised term-weighting technique for topic-based retrieval. Issue 3 (May 2021)
- Main Title:
- Assessing the behavior and performance of a supervised term-weighting technique for topic-based retrieval
- Authors:
- Maisonnave, Mariano
Delbianco, Fernando
Tohmé, Fernando
Maguitman, Ana - Abstract:
- Abstract: Topic-based retrieval is the task of seeking and retrieving material related to a topic of interest. This task involves two subtasks: selecting query terms and ranking the retrieved results. Supervised approaches to assess the importance of a term in a topic or class have demonstrated to be effective for guiding the query-term selection subtask. This article analyzes and evaluates FDD β, a supervised term-weighting scheme that can be applied for query-term selection in topic-based retrieval. FDD β weights terms based on two factors representing the descriptive and discriminating power of the terms with respect to the given topic. It then combines these two factor through the use of an adjustable parameter that allows to favor different aspects of retrieval, such as precision, recall or a balance between both. Previous preliminary studies have demonstrated the potential of FDD β to identify useful query terms. However, preceding studies have limited the analysis to a single domain represented by a single data set with binary categories and have not compared FDD β to other recently formulated term-weighting techniques. The contributions of this article are the following: (1) it presents an extensive analysis of the behavior of FDD β as a function of its adjustable parameter; (2) it compares FDD β against eighteen traditional and state-of-the-art weighting scheme; (3) it evaluates the performance of disjunctive queries built by combining terms selected using theAbstract: Topic-based retrieval is the task of seeking and retrieving material related to a topic of interest. This task involves two subtasks: selecting query terms and ranking the retrieved results. Supervised approaches to assess the importance of a term in a topic or class have demonstrated to be effective for guiding the query-term selection subtask. This article analyzes and evaluates FDD β, a supervised term-weighting scheme that can be applied for query-term selection in topic-based retrieval. FDD β weights terms based on two factors representing the descriptive and discriminating power of the terms with respect to the given topic. It then combines these two factor through the use of an adjustable parameter that allows to favor different aspects of retrieval, such as precision, recall or a balance between both. Previous preliminary studies have demonstrated the potential of FDD β to identify useful query terms. However, preceding studies have limited the analysis to a single domain represented by a single data set with binary categories and have not compared FDD β to other recently formulated term-weighting techniques. The contributions of this article are the following: (1) it presents an extensive analysis of the behavior of FDD β as a function of its adjustable parameter; (2) it compares FDD β against eighteen traditional and state-of-the-art weighting scheme; (3) it evaluates the performance of disjunctive queries built by combining terms selected using the analyzed methods; (4) it makes a full data set and the full code publicly available to replicate the reported analysis and foster future research in the area. The analysis and evaluations are performed on three data sets: two well-known text data sets, namely 20 Newsgroups and Reuters-21578, and the newly released data set. It is possible to conclude that despite its simplicity, FDD β is competitive with state-of-the-art methods and has the important advantage of offering flexibility at the moment of adapting to specific task goals. The results also demonstrate that FDD β offers a useful mechanism to explore different approaches to build complex queries. Highlights: A supervised term-weighting scheme is extensively analyzed and evaluated. The scheme is evaluated in the task of query-term selection for topic-based retrieval. The performance comparison is carried out against eighteen methods on three data sets with promising results. A full manually labeled data set and the full code is made publicly available. … (more)
- Is Part Of:
- Information processing & management. Volume 58:Issue 3(2021)
- Journal:
- Information processing & management
- Issue:
- Volume 58:Issue 3(2021)
- Issue Display:
- Volume 58, Issue 3 (2021)
- Year:
- 2021
- Volume:
- 58
- Issue:
- 3
- Issue Sort Value:
- 2021-0058-0003-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-05
- Subjects:
- Term weighting -- Variable extraction -- Information retrieval -- Query-term selection -- Topic-based retrieval
Information storage and retrieval systems -- Periodicals
Information science -- Periodicals
Systèmes d'information -- Périodiques
Sciences de l'information -- Périodiques
Information science
Information storage and retrieval systems
Periodicals
658.4038 - Journal URLs:
- http://www.sciencedirect.com/science/journal/03064573 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.ipm.2020.102483 ↗
- Languages:
- English
- ISSNs:
- 0306-4573
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4493.893000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 22877.xml