Over‐optimism in benchmark studies and the multiplicity of design and analysis options when interpreting their results. (13th December 2021)
- Record Type:
- Journal Article
- Title:
- Over‐optimism in benchmark studies and the multiplicity of design and analysis options when interpreting their results. (13th December 2021)
- Main Title:
- Over‐optimism in benchmark studies and the multiplicity of design and analysis options when interpreting their results
- Authors:
- Nießl, Christina
Herrmann, Moritz
Wiedemann, Chiara
Casalicchio, Giuseppe
Boulesteix, Anne‐Laure - Abstract:
- Abstract: In recent years, the need for neutral benchmark studies that focus on the comparison of methods coming from computational sciences has been increasingly recognized by the scientific community. While general advice on the design and analysis of neutral benchmark studies can be found in recent literature, a certain flexibility always exists. This includes the choice of data sets and performance measures, the handling of missing performance values, and the way the performance values are aggregated over the data sets. As a consequence of this flexibility, researchers may be concerned about how their choices affect the results or, in the worst case, may be tempted to engage in questionable research practices (e.g., the selective reporting of results or the post hoc modification of design or analysis components) to fit their expectations. To raise awareness for this issue, we use an example benchmark study to illustrate how variable benchmark results can be when all possible combinations of a range of design and analysis options are considered. We then demonstrate how the impact of each choice on the results can be assessed using multidimensional unfolding. In conclusion, based on previous literature and on our illustrative example, we claim that the multiplicity of design and analysis options combined with questionable research practices lead to biased interpretations of benchmark results and to over‐optimistic conclusions. This issue should be considered byAbstract: In recent years, the need for neutral benchmark studies that focus on the comparison of methods coming from computational sciences has been increasingly recognized by the scientific community. While general advice on the design and analysis of neutral benchmark studies can be found in recent literature, a certain flexibility always exists. This includes the choice of data sets and performance measures, the handling of missing performance values, and the way the performance values are aggregated over the data sets. As a consequence of this flexibility, researchers may be concerned about how their choices affect the results or, in the worst case, may be tempted to engage in questionable research practices (e.g., the selective reporting of results or the post hoc modification of design or analysis components) to fit their expectations. To raise awareness for this issue, we use an example benchmark study to illustrate how variable benchmark results can be when all possible combinations of a range of design and analysis options are considered. We then demonstrate how the impact of each choice on the results can be assessed using multidimensional unfolding. In conclusion, based on previous literature and on our illustrative example, we claim that the multiplicity of design and analysis options combined with questionable research practices lead to biased interpretations of benchmark results and to over‐optimistic conclusions. This issue should be considered by computational researchers when designing and analyzing their benchmark studies and by the scientific community in general in an effort towards more reliable benchmark results. This article is categorized under: Technologies > Visualization Technologies > Data Preprocessing Technologies > Structure Discovery and Clustering Abstract : When conducting a benchmark study, researchers are faced with many choices that relate to (1) the general aim of the study, (2) the design of the study, or (3) the analysis of the performance results. As a consequence, benchmark results are highly variable. In combination with questionable research practices, this can lead to biased interpretations and over‐optimistic conclusions. … (more)
- Is Part Of:
- Wiley interdisciplinary reviews. Volume 12:Number 2(2022)
- Journal:
- Wiley interdisciplinary reviews
- Issue:
- Volume 12:Number 2(2022)
- Issue Display:
- Volume 12, Issue 2 (2022)
- Year:
- 2022
- Volume:
- 12
- Issue:
- 2
- Issue Sort Value:
- 2022-0012-0002-0000
- Page Start:
- n/a
- Page End:
- n/a
- Publication Date:
- 2021-12-13
- Subjects:
- benchmarking -- method comparison -- over‐optimistic results -- questionable research practices -- variability of results
Data mining -- Periodicals
006.31205 - Journal URLs:
- http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1942-4795 ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1002/widm.1441 ↗
- Languages:
- English
- ISSNs:
- 1942-4787
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 21090.xml