CROct 14, 2025
Breaking Guardrails, Facing Walls: Insights on Adversarial AI for Defenders & ResearchersGiacomo Bertollo, Naz Bodemir, Jonah Burgess
Analyzing 500 CTF participants, this paper shows that while participants readily bypassed simple AI guardrails using common techniques, layered multi-step defenses still posed significant challenges, offering concrete insights for building safer AI systems.