SI DL IR DATA-ANJan 15, 2020

Unbiased evaluation of ranking metrics reveals consistent performance in science and technology citation data

Shuqi Xu, Manuel Sebastian Mariani, Linyuan Lü, Matúš Medo

arXiv:2001.05414v133 citations

AI Analysis

This work addresses the problem of biased evaluation in research assessment for scientists and evaluators, offering an incremental improvement by introducing a modified procedure to reveal consistent metric performance.

The study evaluated 17 network-based metrics for identifying milestone papers and patents in citation data, finding that traditional evaluation metrics are biased by age distribution and age biases, and proposed a modified procedure that penalizes such biases, revealing PageRank and LeaderRank as the best-performing metrics after bias suppression.

Despite the increasing use of citation-based metrics for research evaluation purposes, we do not know yet which metrics best deliver on their promise to gauge the significance of a scientific paper or a patent. We assess 17 network-based metrics by their ability to identify milestone papers and patents in three large citation datasets. We find that traditional information-retrieval evaluation metrics are strongly affected by the interplay between the age distribution of the milestone items and age biases of the evaluated metrics. Outcomes of these metrics are therefore not representative of the metrics' ranking ability. We argue in favor of a modified evaluation procedure that explicitly penalizes biased metrics and allows us to reveal metrics' performance patterns that are consistent across the datasets. PageRank and LeaderRank turn out to be the best-performing ranking metrics when their age bias is suppressed by a simple transformation of the scores that they produce, whereas other popular metrics, including citation count, HITS and Collective Influence, produce significantly worse ranking results.

View on arXiv PDF

Similar