Retrieval-Feedback-Driven Distillation and Preference Alignment for Efficient LLM-based Query Expansion
This addresses the efficiency problem for practical retrieval systems by enabling effective query expansion with smaller models, though it is incremental as it builds on existing distillation and preference alignment methods.
The paper tackles the high inference cost of using large language models for query expansion in retrieval systems by proposing a retrieval-feedback-driven distillation and preference-alignment framework, which transfers expansion behavior from a large teacher model to a compact student model, achieving about 97% of the teacher's nDCG@10 performance on DL19 while reducing cost.
Large language models have recently enabled a generative paradigm for query expansion, but their high inference cost makes direct deployment difficult in practical retrieval systems. To address this issue, a retrieval-feedback-driven distillation and preference-alignment framework is proposed to transfer retrieval-friendly expansion behavior from a strong teacher model to a compact student model. Rather than relying on few-shot exemplars at inference time, the framework first leverages two complementary types of teacher-generated expansions, produced under zero-shot and few-shot prompting conditions, as supervision signals for distillation and as candidate pools for preference construction. A retrieval-metric-driven strategy is then introduced to automatically form chosen/rejected expansion pairs according to nDCG@10 differences, and Direct Preference Optimization is applied to explicitly align generation preferences with retrieval objectives. Experiments on TREC DL19/20/21 and MIRACL-zh show that the proposed approach preserves strong retrieval effectiveness while substantially reducing inference cost. In particular, the distilled Qwen3-4B model reaches about 97% of the teacher (DeepSeek-685B) model's nDCG@10 performance on DL19, and remains effective on the Chinese MIRACL-zh benchmark, demonstrating strong practicality across both English and Chinese retrieval settings.