IRAug 23, 2017

Evaluation Measures for Relevance and Credibility in Ranked Lists

Christina Lioma, Jakob Grue Simonsen, Birger Larsen

arXiv:1708.07157v115.043 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the problem of biased evaluation in information retrieval for researchers and practitioners, though it is incremental as it builds on existing evaluation frameworks.

The paper tackles the lack of a unified evaluation measure for both relevance and credibility in ranked retrieval results by proposing two novel types of measures, which are shown to be expressive and intuitive on a small human-annotated dataset.

Recent discussions on alternative facts, fake news, and post truth politics have motivated research on creating technologies that allow people not only to access information, but also to assess the credibility of the information presented to them by information retrieval systems. Whereas technology is in place for filtering information according to relevance and/or credibility, no single measure currently exists for evaluating the accuracy or precision (and more generally effectiveness) of both the relevance and the credibility of retrieved results. One obvious way of doing so is to measure relevance and credibility effectiveness separately, and then consolidate the two measures into one. There at least two problems with such an approach: (I) it is not certain that the same criteria are applied to the evaluation of both relevance and credibility (and applying different criteria introduces bias to the evaluation); (II) many more and richer measures exist for assessing relevance effectiveness than for assessing credibility effectiveness (hence risking further bias). Motivated by the above, we present two novel types of evaluation measures that are designed to measure the effectiveness of both relevance and credibility in ranked lists of retrieval results. Experimental evaluation on a small human-annotated dataset (that we make freely available to the research community) shows that our measures are expressive and intuitive in their interpretation.

View on arXiv PDF Code

Similar