AIJan 8

OptiSet: Unified Optimizing Set Selection and Ranking for Retrieval-Augmented Generation

Yi Jiang, Sendong Zhao, Jianbo Li, Bairui Hu, Yanrui Du, Haochun Wang, Bing Qin

arXiv:2601.05027v12.4h-index: 13

Originality Incremental advance

AI Analysis

This addresses the issue of inefficient and redundant evidence retrieval for users of RAG systems, representing an incremental improvement over existing methods.

The paper tackles the problem of redundancy and suboptimal evidence selection in Retrieval-Augmented Generation (RAG) by proposing OptiSet, a framework that unifies set selection and ranking, resulting in improved performance on complex combinatorial problems and more efficient generation.

Retrieval-Augmented Generation (RAG) improves generation quality by incorporating evidence retrieved from large external corpora. However, most existing methods rely on statically selecting top-k passages based on individual relevance, which fails to exploit combinatorial gains among passages and often introduces substantial redundancy. To address this limitation, we propose OptiSet, a set-centric framework that unifies set selection and set-level ranking for RAG. OptiSet adopts an "Expand-then-Refine" paradigm: it first expands a query into multiple perspectives to enable a diverse candidate pool and then refines the candidate pool via re-selection to form a compact evidence set. We then devise a self-synthesis strategy without strong LLM supervision to derive preference labels from the set conditional utility changes of the generator, thereby identifying complementary and redundant evidence. Finally, we introduce a set-list wise training strategy that jointly optimizes set selection and set-level ranking, enabling the model to favor compact, high-gain evidence sets. Extensive experiments demonstrate that OptiSet improves performance on complex combinatorial problems and makes generation more efficient. The source code is publicly available.

View on arXiv PDF

Similar