Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility

Yining Hong, Yining She, Eunsuk Kang, Christopher S. Timperley, Christian Kästner

CMU

arXiv:2604.1557996.5h-index: 15Has Code

Predicted impact top 3% in SE · last 90 daysOriginality Synthesis-oriented

AI Analysis

For developers of domain-specific AI agents in high-stakes business settings, this work provides a practical method to guarantee safety and security requirements, though it is incremental as it builds on existing guardrail concepts.

The paper studies symbolic guardrails for AI agents, finding that 85% of 80 benchmarks lack concrete policies, but 74% of specified policy requirements can be enforced by symbolic guardrails, improving safety and security without sacrificing utility.

AI agents that interact with their environments through tools enable powerful applications, but in high-stakes business settings, unintended actions can cause unacceptable harm, such as privacy breaches and financial loss. Existing mitigations, such as training-based methods and neural guardrails, improve agent reliability but cannot provide guarantees. We study symbolic guardrails as a practical path toward strong safety and security guarantees for AI agents. Our three-part study includes a systematic review of 80 state-of-the-art agent safety and security benchmarks to identify the policies they evaluate, an analysis of which policy requirements can be guaranteed by symbolic guardrails, and an evaluation of how symbolic guardrails affect safety, security, and agent success on $τ^2$-Bench, CAR-bench, and MedAgentBench. We find that 85\% of benchmarks lack concrete policies, relying instead on underspecified high-level goals or common sense. Among the specified policies, 74\% of policy requirements can be enforced by symbolic guardrails, often using simple, low-cost mechanisms. These guardrails improve safety and security without sacrificing agent utility. Overall, our results suggest that symbolic guardrails are a practical and effective way to guarantee some safety and security requirements, especially for domain-specific AI agents. We release all codes and artifacts at https://github.com/hyn0027/agent-symbolic-guardrails.

View on arXiv PDF Code

Similar