A bias–variance evaluation framework for information retrieval systems. Issue 1 (January 2022)
- Record Type:
- Journal Article
- Title:
- A bias–variance evaluation framework for information retrieval systems. Issue 1 (January 2022)
- Main Title:
- A bias–variance evaluation framework for information retrieval systems
- Authors:
- Zhang, Peng
Gao, Hui
Hu, Zeting
Yang, Meng
Song, Dawei
Wang, Jun
Hou, Yuexian
Hu, Bin - Abstract:
- Abstract: In information retrieval (IR), the improvement of the effectiveness often sacrifices the stability of an IR system. To evaluate the stability, many risk-sensitive metrics have been proposed. Since the theoretical limitations, the current works study the effectiveness and stability separately, and have not explored the effectiveness–stability tradeoff. In this paper, we propose a Bias–Variance Tradeoff Evaluation (BV-Test) framework, based on the bias–variance decomposition of the mean squared error, to measure the overall performance (considering both effectiveness and stability) and the tradeoff between effectiveness and stability of a system. In this framework, we define generalized bias–variance metrics, based on the Cranfield-style experiment set-up where the document collection is fixed (across topics) or the set-up where document collection is a sample (per-topic). Compared with risk-sensitive evaluation methods, our work not only measures the effectiveness–stability tradeoff of a system, but also effectively tracks the source of system instability. Experiments on TREC Ad-hoc track (1993–1999) and Web track (2010–2014) show a clear effectiveness–stability tradeoff across topics and per-topic, and topic grouping and max–min normalization can effectively reduce the bias–variance tradeoff. Experimental results on TREC Session track (2010–2012) also show that the query reformulation and increase of user data are beneficial to both effectiveness and stabilityAbstract: In information retrieval (IR), the improvement of the effectiveness often sacrifices the stability of an IR system. To evaluate the stability, many risk-sensitive metrics have been proposed. Since the theoretical limitations, the current works study the effectiveness and stability separately, and have not explored the effectiveness–stability tradeoff. In this paper, we propose a Bias–Variance Tradeoff Evaluation (BV-Test) framework, based on the bias–variance decomposition of the mean squared error, to measure the overall performance (considering both effectiveness and stability) and the tradeoff between effectiveness and stability of a system. In this framework, we define generalized bias–variance metrics, based on the Cranfield-style experiment set-up where the document collection is fixed (across topics) or the set-up where document collection is a sample (per-topic). Compared with risk-sensitive evaluation methods, our work not only measures the effectiveness–stability tradeoff of a system, but also effectively tracks the source of system instability. Experiments on TREC Ad-hoc track (1993–1999) and Web track (2010–2014) show a clear effectiveness–stability tradeoff across topics and per-topic, and topic grouping and max–min normalization can effectively reduce the bias–variance tradeoff. Experimental results on TREC Session track (2010–2012) also show that the query reformulation and increase of user data are beneficial to both effectiveness and stability simultaneously. Highlights: A unified bias–variance metric evaluates retrieval effectiveness–stability tradeoff. A generalized bias–variance metric is defined based on across topic and per-topic. Studying the factors that influence the bias–variance metric (topic grouping, etc.). Decomposition of a variance can effectively track the source of system instability. … (more)
- Is Part Of:
- Information processing & management. Volume 59:Issue 1(2022)
- Journal:
- Information processing & management
- Issue:
- Volume 59:Issue 1(2022)
- Issue Display:
- Volume 59, Issue 1 (2022)
- Year:
- 2022
- Volume:
- 59
- Issue:
- 1
- Issue Sort Value:
- 2022-0059-0001-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-01
- Subjects:
- Information retrieval -- Evaluation metrics -- Effectiveness–stability tradeoff
Information storage and retrieval systems -- Periodicals
Information science -- Periodicals
Systèmes d'information -- Périodiques
Sciences de l'information -- Périodiques
Information science
Information storage and retrieval systems
Periodicals
658.4038 - Journal URLs:
- http://www.sciencedirect.com/science/journal/03064573 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.ipm.2021.102747 ↗
- Languages:
- English
- ISSNs:
- 0306-4573
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4493.893000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 19853.xml