IRAIDBLGJan 14

DSL-R1: From SQL to DSL for Training Retrieval Agents across Structured and Unstructured Data with Reinforcement Learning

arXiv:2603.21018h-index: 2
Originality Highly original
AI Analysis

This work addresses the challenge of hybrid retrieval for domains requiring integration of structured and unstructured data, representing a novel method for a known bottleneck.

The paper tackled the problem of bridging structured metadata and unstructured content for retrieval in complex domains by proposing DSL-R1, a unified framework that synergizes logical reasoning with semantic matching, achieving a +12.3% improvement in Hit@1/3 on a large-scale industrial email benchmark.

Effective retrieval in complex domains requires bridging the gap between structured metadata and unstructured content. Existing systems typically isolate these capabilities, relying on either symbolic filtering or vector similarity, failing to capture their interplay. In this work, we propose DSL-R1, a unified framework that synergizes logical reasoning with semantic matching via a novel Domain-Specific Language (DSL). By embedding vector primitives within SQL-style operators, our approach leverages the complementary strengths of symbolic precision and semantic coverage. We further introduce a reinforcement learning mechanism where rule-based execution feedback and retrieval quality rewards jointly optimize the DSL generation, balancing structural correctness and semantic alignment. Evaluations on a large-scale industrial email benchmark demonstrate that DSL-R1 achieves a +12.3% improvement in Hit@1/3, consistently outperforming decoupled baselines and establishing a robust paradigm for hybrid retrieval.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes