IRJun 3

EviRank: Evidence-Based Confidence Estimation for LLM-Based Ranking

Meng Yan, Cai Xv, Xujing Wang, Ziyu Guan, Wei Zhao

arXiv:2606.0472771.3

AI Analysis

For practitioners using LLMs for ranking tasks, EviRank provides a method to identify unreliable positions in the ranking list, improving trustworthiness.

EviRank addresses the lack of position-specific reliability in LLM-based ranking by extracting three complementary evidences from a single forward pass, aggregating them via reliable opinion aggregation, and applying position-aware calibration. It achieves state-of-the-art performance on both recommendation and uncertainty quantification across three datasets.

Large Language Models show promise for recommendation, but they raise reliability concerns due to limited domain coverage and inherent stochasticity. Existing uncertainty quantification methods persist two fundamental challenges: (1) the global confidence score designed for question answering fails to reveal which positions are unreliable in ranking list; (2) fine-grained confidence extracted from model internals exhibits uniformly low values across all positions, making it impossible to filter unreliable predictions. To tackle the challenges, we propose an evidence-based confidence estimation for LLM-based ranking (EviRank). We extract three complementary evidences from a single forward pass and aggregate them via reliable opinion aggregation. Furthermore, we recognize that ranking positions are inherently unequal, and introduce a position-aware calibration. Lastly, the calibrated confidence guides ranking optimization. Experiments on three datasets demonstrate that our method achieves state-of-the-art performance on both recommendation and uncertainty quantification.

View on arXiv PDF

Similar