Autoregressive Ranking: Bridging the Gap Between Dual and Cross Encoders
This work addresses the problem of inefficient and theoretically unsupported ranking methods for retrieval systems, offering incremental improvements in training and theoretical understanding.
The paper tackles the lack of theoretical foundation and rank-agnostic training loss in autoregressive ranking (ARR) for retrieval, proving ARR's expressive superiority over dual encoders and proposing a rank-aware loss that improves ranking metrics beyond top-1 retrieval.
The success of Large Language Models (LLMs) has motivated a shift toward generative approaches to retrieval and ranking, aiming to supersede classical Dual Encoders (DEs) and Cross Encoders (CEs). A prominent paradigm is pointwise Autoregressive Ranking (ARR), where an LLM generates document identifiers (docIDs) token-by-token to enable ranking via beam search. ARR offers the promise of superior expressivity compared to DEs while avoiding the prohibitive computational cost of CEs. However, a formal theoretical foundation for this expressive power has been missing. Moreover, the standard next-token prediction loss is rank-agnostic and inappropriate for finetuning an LLM for ranking tasks. In this paper, we first prove that the expressive capacity of ARR is strictly superior to DEs. While a DE requires an embedding dimension that grows linearly with corpus size to achieve arbitrary rankings, ARR can solve it with a constant hidden dimension. We then propose SToICaL (Simple Token-Item Calibrated Loss), a generalized rank-aware training loss for LLM finetuning. By using item-level reweighting and prefix-tree marginalization, we distribute probability mass over valid docID tokens based on their ground-truth relevance. Experiments on WordNet and ESCI datasets verify that our loss suppresses invalid docID generations and significantly improves ranking metrics beyond top-1 retrieval.