AIFeb 3

Risky-Bench: Probing Agentic Safety Risks under Real-World Deployment

arXiv:2602.03100v16 citationsh-index: 28
Originality Highly original
AI Analysis

This addresses safety risks for AI agents in real-world applications, offering an extensible methodology that is not incremental but provides a new structured approach.

The paper tackles the problem of evaluating safety risks in LLM-based agents under real-world deployment by proposing Risky-Bench, a framework that uncovers substantial safety risks in state-of-the-art agents, such as in life-assist scenarios, through systematic and adaptable evaluation.

Large Language Models (LLMs) are increasingly deployed as agents that operate in real-world environments, introducing safety risks beyond linguistic harm. Existing agent safety evaluations rely on risk-oriented tasks tailored to specific agent settings, resulting in limited coverage of safety risk space and failing to assess agent safety behavior during long-horizon, interactive task execution in complex real-world deployments. Moreover, their specialization to particular agent settings limits adaptability across diverse agent configurations. To address these limitations, we propose Risky-Bench, a framework that enables systematic agent safety evaluation grounded in real-world deployment. Risky-Bench organizes evaluation around domain-agnostic safety principles to derive context-aware safety rubrics that delineate safety space, and systematically evaluates safety risks across this space through realistic task execution under varying threat assumptions. When applied to life-assist agent settings, Risky-Bench uncovers substantial safety risks in state-of-the-art agents under realistic execution conditions. Moreover, as a well-structured evaluation pipeline, Risky-Bench is not confined to life-assist scenarios and can be adapted to other deployment settings to construct environment-specific safety evaluations, providing an extensible methodology for agent safety assessment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes