AIDBNov 20, 2025

Trustworthy AI in the Agentic Lakehouse: from Concurrency to Governance

arXiv:2511.16402v14 citationsh-index: 11
Originality Incremental advance
AI Analysis

This work addresses the challenge of enabling trustworthy agentic workflows in enterprises, which is incremental as it builds on existing lakehouse concepts by adapting them for agent-specific needs.

The paper tackles the problem of making AI agents trustworthy for enterprise production data by addressing infrastructure limitations, proposing Bauplan, an agent-first lakehouse design that ensures data and compute isolation, and demonstrates its effectiveness with a self-healing pipeline implementation.

Even as AI capabilities improve, most enterprises do not consider agents trustworthy enough to work on production data. In this paper, we argue that the path to trustworthy agentic workflows begins with solving the infrastructure problem first: traditional lakehouses are not suited for agent access patterns, but if we design one around transactions, governance follows. In particular, we draw an operational analogy to MVCC in databases and show why a direct transplant fails in a decoupled, multi-language setting. We then propose an agent-first design, Bauplan, that reimplements data and compute isolation in the lakehouse. We conclude by sharing a reference implementation of a self-healing pipeline in Bauplan, which seamlessly couples agent reasoning with all the desired guarantees for correctness and trust.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes