LGOct 30, 2025

Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections

David Schmotz, Sahar Abdelnabi, Maksym Andriushchenko

arXiv:2510.26328v119 citationsh-index: 3Has Code

Originality Incremental advance

AI Analysis

This highlights a critical security problem for users of frontier LLMs, revealing that even advanced frameworks remain vulnerable to realistic attacks, which is incremental as it builds on known prompt injection issues.

The paper tackles the security vulnerability of Agent Skills in LLMs by demonstrating that they enable simple prompt injections, allowing malicious instructions to exfiltrate sensitive data like passwords and bypass system guardrails, with examples showing how benign approvals can lead to harmful actions.

Enabling continual learning in LLMs remains a key unresolved research challenge. In a recent announcement, a frontier LLM company made a step towards this by introducing Agent Skills, a framework that equips agents with new knowledge based on instructions stored in simple markdown files. Although Agent Skills can be a very useful tool, we show that they are fundamentally insecure, since they enable trivially simple prompt injections. We demonstrate how to hide malicious instructions in long Agent Skill files and referenced scripts to exfiltrate sensitive data, such as internal files or passwords. Importantly, we show how to bypass system-level guardrails of a popular coding agent: a benign, task-specific approval with the "Don't ask again" option can carry over to closely related but harmful actions. Overall, we conclude that despite ongoing research efforts and scaling model capabilities, frontier LLMs remain vulnerable to very simple prompt injections in realistic scenarios. Our code is available at https://github.com/aisa-group/promptinject-agent-skills.

View on arXiv PDF Code

Similar