DBAIMay 29

Sophrosyne: Agentic Exploration of Relational Data Systems Needs Moderation

arXiv:2605.3086258.4
AI Analysis

This work is significant for developers and users of Text2SQL agents interacting with relational databases, as it improves the efficiency and accuracy of SQL generation by addressing over-exploration.

The paper addresses the issue of Text2SQL agents over-exploring relational data systems when using fine-grained APIs, leading to inaccurate SQL generation. They propose Sophrosyne, a data system environment that uses API response directives to moderate agent exploration, reducing over-exploration by 4.6x and boosting accuracy by up to 12.4% (approximately 4 percentage points).

Text2SQL agents powered by LLMs translate natural language intent into SQL by exploring the data system through tool calls before formulating the query. However, to ensure secure and scoped access, data systems construct environments with explicit API surfaces. We study and categorize these APIs exposed today as either coarse-grained or fine-grained and posit that choosing between them presents a fundamental tradeoff between cost-efficient exploration and accurate SQL generation. Most data systems expose fine-grained APIs, but this inadvertently disadvantages agents: they over-explore, incorporating irrelevant schema elements into their query formulation and produce inaccurate results. We argue that curbing over-exploration is key to the effective use of these API surfaces, and propose Sophrosyne, a data system environment that augments API responses with directives that guide the agent's exploration process. Initial results show that directives reduce over-exploration by 4.6x and boost accuracy by up to 12.4% (approx. 4 percentage points).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes