CL LGOct 11, 2023

Found in the Middle: Permutation Self-Consistency Improves Listwise Ranking in Large Language Models

Raphael Tang, Xinyu Zhang, Xueguang Ma, Jimmy Lin, Ferhan Ture

arXiv:2310.07712v214.249 citationsh-index: 26Has Code

Originality Highly original

AI Analysis

This addresses a specific problem of positional bias in LLM-based ranking for tasks like sorting and passage reranking, offering a novel method that surpasses previous state-of-the-art results.

The paper tackles positional bias in listwise ranking by large language models (LLMs) by proposing permutation self-consistency, which marginalizes out list orders to reduce bias, resulting in improvements of up to 7-18% for GPT-3.5 and 8-16% for LLaMA v2 on ranking datasets.

Large language models (LLMs) exhibit positional bias in how they use context, which especially complicates listwise ranking. To address this, we propose permutation self-consistency, a form of self-consistency over ranking list outputs of black-box LLMs. Our key idea is to marginalize out different list orders in the prompt to produce an order-independent ranking with less positional bias. First, given some input prompt, we repeatedly shuffle the list in the prompt and pass it through the LLM while holding the instructions the same. Next, we aggregate the resulting sample of rankings by computing the central ranking closest in distance to all of them, marginalizing out prompt order biases in the process. Theoretically, we prove the robustness of our method, showing convergence to the true ranking in the presence of random perturbations. Empirically, on five list-ranking datasets in sorting and passage reranking, our approach improves scores from conventional inference by up to 7-18% for GPT-3.5 and 8-16% for LLaMA v2 (70B), surpassing the previous state of the art in passage reranking. Our code is at https://github.com/castorini/perm-sc.

View on arXiv PDF Code

Similar