OSApr 16

Don't Let AI Agents YOLO Your Files: Shifting Information and Control to Filesystems for Agent Safety and Autonomy

Shawn Wanxiang Zhong, Junxuan Liao, Jing Liu, Mai Zheng, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau

arXiv:2604.1353654.7h-index: 59

AI Analysis

This work addresses the safety-autonomy tradeoff for AI coding agents by redesigning the filesystem to provide better information and control, offering a practical solution to a pressing problem in AI safety.

AI coding agents frequently corrupt data, delete files, and leak secrets when operating on users' filesystems. The authors propose YoloFS, an agent-native filesystem that uses staging, snapshots, and progressive permissions to enable agent self-correction in 8 of 11 tasks with hidden side effects while reducing user interactions on routine tasks.

AI coding agents operate directly on users' filesystems, where they regularly corrupt data, delete files, and leak secrets. Current approaches force a tradeoff between safety and autonomy: unrestricted access risks harm, while frequent permission prompts burden users and block agents. To understand this problem, we conduct the first systematic study of agent filesystem misuse, analyzing 290 public reports across 13 frameworks. Our analysis reveals that today's agents have limited information about their filesystem effects and insufficient control over them. We therefore argue for shifting this information and control to the filesystem itself. Based on this principle, we design YoloFS, an agent-native filesystem with three techniques. Staging isolates all mutations before commit, giving users corrective control. Snapshots extend this control to agents, letting them detect and correct their own mistakes. Progressive permission provides users with preventive control by gating access with minimal interaction. To evaluate YoloFS, we introduce a new methodology that captures user-agent-filesystem interactions. On 11 tasks with hidden side effects, YoloFS enables agent self-correction in 8 while keeping all effects staged and reviewable. On 112 routine tasks, YoloFS requires fewer user interactions while matching the baseline success rate.

View on arXiv PDF

Similar