IRMay 27

Beyond Similarity: Task-Aligned Retrieval for Language Models

arXiv:2605.2795171.4

AI Analysis

For practitioners using RAG in rule-constrained domains (e.g., code compliance, policy enforcement), TAG offers a principled alternative to semantic search when topical relevance misaligns with task requirements.

TAG replaces similarity-based retrieval with applicability-based rule selection for RAG, achieving up to 12.2% improvement in rule-governed tasks while reducing retrieved context by up to 93%.

Retrieval-augmented generation (RAG) ranks passages by semantic similarity to the input, implicitly assuming that semantic similarity is a reliable indication of applicability in downstream tasks. This assumption breaks down when task success depends not on topical relevance but on applying the correct rules, constraints, or procedural guidance. In such settings, the most useful context may be the rule triggered by the input rather than the most semantically similar passage. We propose Task-Aligned Retrieval (TAG), a retrieval framework that replaces similarity-based retrieval with applicability-based rule selection. TAG transforms source documents into traceable condition-action rules, identifies which rules apply to a given input through pairwise LLM judgments, and generates the output conditioned only on the selected actions. We empirically observe that across Wikipedia NPOV rewriting, HumanEval with PEP~8 compliance, and NBA transaction reasoning on RuleArena, TAG consistently outperforms standard RAG, with the largest gains in high-mismatch settings (up to 12.2\%) while reducing retrieved context by up to 93\%. These results suggest that, in rule- and instruction-governed tasks, retrieval should optimize for applicability rather than for semantic similarity alone.

View on arXiv PDF

Similar