IROct 26, 2020

How to Measure the Reproducibility of System-oriented IR Experiments

Timo Breuer, Nicola Ferro, Norbert Fuhr, Maria Maistro, Tetsuya Sakai, Philipp Schaer, Ian Soboroff

arXiv:2010.13447v133 citations

Originality Synthesis-oriented

AI Analysis

This addresses a methodological gap for researchers in information retrieval who need objective ways to evaluate reproducibility, though it is incremental as it builds on existing concerns without introducing a new paradigm.

The paper tackles the problem of assessing reproducibility in system-oriented information retrieval experiments by comparing several measures to quantify replication or reproduction at different granularities, and develops a reproducibility-oriented dataset for validation and future use.

Replicability and reproducibility of experimental results are primary concerns in all the areas of science and IR is not an exception. Besides the problem of moving the field towards more reproducible experimental practices and protocols, we also face a severe methodological issue: we do not have any means to assess when reproduced is reproduced. Moreover, we lack any reproducibility-oriented dataset, which would allow us to develop such methods. To address these issues, we compare several measures to objectively quantify to what extent we have replicated or reproduced a system-oriented IR experiment. These measures operate at different levels of granularity, from the fine-grained comparison of ranked lists, to the more general comparison of the obtained effects and significant differences. Moreover, we also develop a reproducibility-oriented dataset, which allows us to validate our measures and which can also be used to develop future measures.

View on arXiv PDF

Similar