No Certificate, No Execution: Certified Traces as a Foundation for Trustworthy AI Agents
For developers and deployers of AI agents in high-stakes domains, this provides a formal framework to ensure actions are permissible before execution, addressing a critical trust gap not covered by existing guardrails or audits.
The paper proposes a Proposal-Certification-Execution (PCE) architecture for trustworthy AI agents, where execution only occurs for traces certified as permissible under a policy system, formalized as 'no certificate, no execution'. It connects this to proof-carrying execution and zero-knowledge certificates, shifting evaluation from output accuracy to what traces can be safely certified.
We argue that trustworthy AI agents, especially in high-stakes and policy-governed domains, should make execution conditional on certified traces rather than rely only on stronger generative models, output-level guardrails, or post-hoc audits. A generative agent may propose recommendations, tool calls, reports, or actions, but generation is not permission: an action may be computable yet impermissible, and individually permissible actions may compose into an impermissible trace. We formalize trustworthy agency through a \textbf{Proposal--Certification--Execution (PCE)} architecture: a probabilistic generating machine $M_G$ proposes candidate execution traces, a \textbf{Permissibility Machine} $M_Π$ certifies proposed traces under a policy system $Π$, and execution proceeds only for certified traces. The executable trace language is $L_{\mathrm{exec}} = L_G \cap L_{\mathrm{cert}}(M_Π)$. Before execution, a trace is a structured pre-execution record submitted for certification: it specifies intended steps, evidence, proposed tool calls, approvals, replayable computations, credentials, and execution conditions. This perspective complements chain-of-thought monitorability: visible reasoning may help detect misbehavior, but monitorability is not certifiability, and reasoning is only one component of a broader execution trace. The formal principle is simple: an agent-generated trace should execute only when it carries a checkable certificate witnessing permissibility under $Π$: \textbf{no certificate, no execution}. We develop certified traces and Permissibility Machines as foundations for trustworthy AI agents, connect trace certification to proof-carrying execution, proof memory, privacy, and zero-knowledge certificates, and propose evaluating agents by what generated traces can be safely certified for execution, not by output accuracy alone.