IR LG MLJun 5

Bradley-Terry Rankings for Recommender Systems Across Dataset Taxonomies

Ekaterina Grishina, Stepan Kuznetsov, Askar Tsyganov, Ilya Ivanov, Daria Korovaitceva, Margarita Rusanova, Uliana Parkina, Alexander Derevyagin, Evgeny Frolov, Sergey Samsonov, Anton Lysenko

arXiv:2606.0749215.8

Originality Incremental advance

AI Analysis

For practitioners and researchers in recommender systems, it provides a more reliable and data-driven approach to algorithm comparison, addressing the limitations of naive metric aggregation.

The paper addresses the challenge of ranking recommendation algorithms fairly across diverse datasets, proposing a Bradley-Terry-based methodology that yields robust rankings dependent on dataset statistics and enables ranking on unseen datasets without model execution.

The ranking of recommendation algorithms is a challenging problem since model performance is sensitive to dataset characteristics such as sparsity, sequential structure, and scale. This drives a demand for a proper methodology for fair comparison between algorithms. Naive aggregation of performance metrics (e.g., averaging NDCG over benchmarks) can yield misleading rankings, undermining practical selection. To address this problem, we introduce a novel, data-driven ranking methodology based on Bradley-Terry (BT) model. We demonstrate that the obtained ranking depends on key dataset statistics. Additionally, we propose a novel metric for evaluating ranking consistency and demonstrate robustness of our ranking to incomplete data. Finally, we introduce a dataset-specific methodology for ranking algorithms on unseen datasets without running the models, relying on extensions of the Bradley-Terry framework, including BT trees and BT models with covariates.

View on arXiv PDF

Similar