IRNov 1, 2020

Cheap IR Evaluation: Fewer Topics, No Relevance Judgements, and Crowdsourced Assessments

arXiv:2011.00479v11.6

Originality Incremental advance

AI Analysis

This addresses the high cost and inefficiency of traditional IR evaluation methods for researchers and companies, offering a more practical and engineered solution.

The paper tackles the problem of expensive and resource-intensive evaluation of Information Retrieval (IR) systems by proposing a novel approach that reduces the need for many topics and human relevance judgments, using crowdsourced assessments to save resources while maintaining effectiveness evaluation.

To evaluate Information Retrieval (IR) effectiveness, a possible approach is to use test collections, which are composed of a collection of documents, a set of description of information needs (called topics), and a set of relevant documents to each topic. Test collections are modelled in a competition scenario: for example, in the well known TREC initiative, participants run their own retrieval systems over a set of topics and they provide a ranked list of retrieved documents; some of the retrieved documents (usually the first ranked) constitute the so called pool, and their relevance is evaluated by human assessors; the document list is then used to compute effectiveness metrics and rank the participant systems. Private Web Search companies also run their in-house evaluation exercises; although the details are mostly unknown, and the aims are somehow different, the overall approach shares several issues with the test collection approach. The aim of this work is to: (i) develop and improve some state-of-the-art work on the evaluation of IR effectiveness while saving resources, and (ii) propose a novel, more principled and engineered, overall approach to test collection based effectiveness evaluation. [...]

View on arXiv PDF

Similar