IRAIOct 20, 2024

HyQE: Ranking Contexts with Hypothetical Query Embeddings

arXiv:2410.15262v129 citationsh-index: 6Has CodeEMNLP
Originality Incremental advance
AI Analysis

This addresses scalability and fine-tuning issues in context ranking for retrieval-augmented systems, though it is incremental as it combines existing embedding and LLM techniques.

The paper tackles the problem of ranking contexts in retrieval-augmented systems by introducing HyQE, a framework that uses a pre-trained LLM to hypothesize user queries from contexts and ranks based on similarity to the actual query, improving ranking performance across multiple benchmarks.

In retrieval-augmented systems, context ranking techniques are commonly employed to reorder the retrieved contexts based on their relevance to a user query. A standard approach is to measure this relevance through the similarity between contexts and queries in the embedding space. However, such similarity often fails to capture the relevance. Alternatively, large language models (LLMs) have been used for ranking contexts. However, they can encounter scalability issues when the number of candidate contexts grows and the context window sizes of the LLMs remain constrained. Additionally, these approaches require fine-tuning LLMs with domain-specific data. In this work, we introduce a scalable ranking framework that combines embedding similarity and LLM capabilities without requiring LLM fine-tuning. Our framework uses a pre-trained LLM to hypothesize the user query based on the retrieved contexts and ranks the context based on the similarity between the hypothesized queries and the user query. Our framework is efficient at inference time and is compatible with many other retrieval and ranking techniques. Experimental results show that our method improves the ranking performance across multiple benchmarks. The complete code and data are available at https://github.com/zwc662/hyqe

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes