LGJun 13, 2024

GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning

arXiv:2406.09187v379 citations
Originality Incremental advance
AI Analysis

It addresses safety issues for LLM agents in domains like healthcare and web navigation, though it appears incremental as it builds on existing LLM reasoning with added guardrail mechanisms.

The paper tackles the safety and security concerns of LLM agents by introducing GuardAgent, a guardrail agent that dynamically checks and safeguards target agents' actions against given safety requests, achieving over 98% and 83% guardrail accuracies on two new benchmarks.

The rapid advancement of large language model (LLM) agents has raised new concerns regarding their safety and security. In this paper, we propose GuardAgent, the first guardrail agent to protect target agents by dynamically checking whether their actions satisfy given safety guard requests. Specifically, GuardAgent first analyzes the safety guard requests to generate a task plan, and then maps this plan into guardrail code for execution. By performing the code execution, GuardAgent can deterministically follow the safety guard request and safeguard target agents. In both steps, an LLM is utilized as the reasoning component, supplemented by in-context demonstrations retrieved from a memory module storing experiences from previous tasks. In addition, we propose two novel benchmarks: EICU-AC benchmark to assess the access control for healthcare agents and Mind2Web-SC benchmark to evaluate the safety policies for web agents. We show that GuardAgent effectively moderates the violation actions for different types of agents on these two benchmarks with over 98% and 83% guardrail accuracies, respectively. Project page: https://guardagent.github.io/

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes