IRJun 3

EviRank: Evidence-Based Confidence Estimation for LLM-Based Ranking

arXiv:2606.0472771.3
AI Analysis

For practitioners using LLMs for ranking tasks, EviRank provides a method to identify unreliable positions in the ranking list, improving trustworthiness.

EviRank addresses the lack of position-specific reliability in LLM-based ranking by extracting three complementary evidences from a single forward pass, aggregating them via reliable opinion aggregation, and applying position-aware calibration. It achieves state-of-the-art performance on both recommendation and uncertainty quantification across three datasets.

Large Language Models show promise for recommendation, but they raise reliability concerns due to limited domain coverage and inherent stochasticity. Existing uncertainty quantification methods persist two fundamental challenges: (1) the global confidence score designed for question answering fails to reveal which positions are unreliable in ranking list; (2) fine-grained confidence extracted from model internals exhibits uniformly low values across all positions, making it impossible to filter unreliable predictions. To tackle the challenges, we propose an evidence-based confidence estimation for LLM-based ranking (EviRank). We extract three complementary evidences from a single forward pass and aggregate them via reliable opinion aggregation. Furthermore, we recognize that ranking positions are inherently unequal, and introduce a position-aware calibration. Lastly, the calibrated confidence guides ranking optimization. Experiments on three datasets demonstrate that our method achieves state-of-the-art performance on both recommendation and uncertainty quantification.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes