DB AIJun 4

Data Flow Control: Data Safety Policies for AI Agents

arXiv:2606.0567983.6Has Code

AI Analysis

For AI agents and data systems that need to enforce data safety constraints automatically within the database infrastructure, this work provides a practical and efficient solution.

The paper introduces Data Flow Control (DFC), a framework for enforcing data safety policies (e.g., regulatory, privacy) on tuple-level data flows within DBMS queries. Passant, a query rewriting layer, achieves ~0% overhead across five DBMS engines and outperforms alternatives by orders of magnitude.

Agents increasingly generate SQL, orchestrate pipelines, and automate data analysis on behalf of users. While recent work improves query correctness, correctness is not safety. A query may be semantically valid yet violate regulatory, privacy, or business constraints that govern how data may be combined and released. We argue that enforcing such constraints is fundamentally a data infrastructure problem. This paper introduces Data Flow Control (DFC), a framework to declaratively specify and guarantee policy enforcement over tuple-level data flows within a DBMS query. A key challenge is defining a policy language that is optimizer-invariant yet efficient to enforce at scale. We formalize data safety as aggregate predicates over provenance monomials and present Passant, a portable query rewriting layer that enforces DFC policies without materializing provenance. Across five DBMS engines -- DuckDB, Umbra, PostgreSQL, DataFusion, and SQLServer -- Passant achieves ~0% overhead and outperforms alternatives by orders of magnitude. As a result, Data Flow Control is the first step towards moving data safety from prompts and post-hoc checks into the data infrastructure. Data Flow Control is available open source at https://github.com/dataflowcontrol/data-flow-control.

View on arXiv PDF Code

Similar