IRFeb 10, 2013

On Search Engine Evaluation Metrics

arXiv:1302.2318v112 citations

Originality Synthesis-oriented

AI Analysis

This addresses the issue of unreliable metrics in search engine evaluation for researchers and practitioners, but it is incremental as it builds on existing critiques and studies.

The paper tackles the problem of evaluating search engine evaluation metrics by introducing a meta-evaluation metric called Preference Identification Ratio (PIR) and a framework for testing multiple metrics, showing that unquestioning adherence to metrics or their parameters is disadvantageous.

The search engine evaluation research has quite a lot metrics available to it. Only recently, the question of the significance of individual metrics started being raised, as these metrics' correlations to real-world user experiences or performance have generally not been well-studied. The first part of this thesis provides an overview of previous literature on the evaluation of search engine evaluation metrics themselves, as well as critiques of and comments on individual studies and approaches. The second part introduces a meta-evaluation metric, the Preference Identification Ratio (PIR), that quantifies the capacity of an evaluation metric to capture users' preferences. Also, a framework for simultaneously evaluating many metrics while varying their parameters and evaluation standards is introduced. Both PIR and the meta-evaluation framework are tested in a study which shows some interesting preliminary results; in particular, the unquestioning adherence to metrics or their ad hoc parameters seems to be disadvantageous. Instead, evaluation methods should themselves be rigorously evaluated with regard to goals set for a particular study.

View on arXiv PDF

Similar