An Alternative Cross Entropy Loss for Learning-to-Rank
This work addresses a theoretical gap in ranking algorithms for information retrieval, offering a more principled approach that could improve ranking systems, though it is incremental in providing a new loss function rather than a paradigm shift.
The authors tackled the lack of theoretical grounding in listwise learning-to-rank methods by proposing a cross entropy-based loss function that is a convex bound on NDCG and consistent with it under common scenarios, achieving superior quality and robustness in empirical evaluations on benchmark datasets.
Listwise learning-to-rank methods form a powerful class of ranking algorithms that are widely adopted in applications such as information retrieval. These algorithms learn to rank a set of items by optimizing a loss that is a function of the entire set -- as a surrogate to a typically non-differentiable ranking metric. Despite their empirical success, existing listwise methods are based on heuristics and remain theoretically ill-understood. In particular, none of the empirically successful loss functions are related to ranking metrics. In this work, we propose a cross entropy-based learning-to-rank loss function that is theoretically sound, is a convex bound on NDCG -- a popular ranking metric -- and is consistent with NDCG under learning scenarios common in information retrieval. Furthermore, empirical evaluation of an implementation of the proposed method with gradient boosting machines on benchmark learning-to-rank datasets demonstrates the superiority of our proposed formulation over existing algorithms in quality and robustness.