IR AI DB LGJan 14

DSL-R1: From SQL to DSL for Training Retrieval Agents across Structured and Unstructured Data with Reinforcement Learning

Yunhai Hu, Junwei Zhou, Yumo Cao, Yitao Long, Yiwei Xu, Qiyi Jiang, Weiyao Wang, Xiaoyu Cao, Zhen Sun, Yiran Zou, Nan Du

arXiv:2603.21018h-index: 2

Originality Highly original

AI Analysis

This work addresses the challenge of hybrid retrieval for domains requiring integration of structured and unstructured data, representing a novel method for a known bottleneck.

The paper tackled the problem of bridging structured metadata and unstructured content for retrieval in complex domains by proposing DSL-R1, a unified framework that synergizes logical reasoning with semantic matching, achieving a +12.3% improvement in Hit@1/3 on a large-scale industrial email benchmark.

Effective retrieval in complex domains requires bridging the gap between structured metadata and unstructured content. Existing systems typically isolate these capabilities, relying on either symbolic filtering or vector similarity, failing to capture their interplay. In this work, we propose DSL-R1, a unified framework that synergizes logical reasoning with semantic matching via a novel Domain-Specific Language (DSL). By embedding vector primitives within SQL-style operators, our approach leverages the complementary strengths of symbolic precision and semantic coverage. We further introduce a reinforcement learning mechanism where rule-based execution feedback and retrieval quality rewards jointly optimize the DSL generation, balancing structural correctness and semantic alignment. Evaluations on a large-scale industrial email benchmark demonstrate that DSL-R1 achieves a +12.3% improvement in Hit@1/3, consistently outperforming decoupled baselines and establishing a robust paradigm for hybrid retrieval.

View on arXiv PDF

Similar