Logic-Oriented Retriever Enhancement via Contrastive Learning
This addresses a bottleneck in retrieval-augmented generation for large language models, offering an incremental enhancement to existing retriever methods without requiring external resources.
The paper tackles the problem of retrievers overfitting to surface similarity and failing on queries with complex logical relations in knowledge-intensive tasks, by introducing LORE which uses fine-grained contrastive learning to activate latent logical analysis capacity in embeddings. The result is consistent improvements in retrieval utility and downstream generation while maintaining efficiency, with datasets and code made publicly available.
Large language models (LLMs) struggle in knowledge-intensive tasks, as retrievers often overfit to surface similarity and fail on queries involving complex logical relations. The capacity for logical analysis is inherent in model representations but remains underutilized in standard training. LORE (Logic ORiented Retriever Enhancement) introduces fine-grained contrastive learning to activate this latent capacity, guiding embeddings toward evidence aligned with logical structure rather than shallow similarity. LORE requires no external upervision, resources, or pre-retrieval analysis, remains index-compatible, and consistently improves retrieval utility and downstream generation while maintaining efficiency. The datasets and code are publicly available at https://github.com/mazehart/Lore-RAG.