Klaudia Krawiecka

CR
h-index4
5papers
46citations
Novelty51%
AI Score50

5 Papers

CRMar 16Code
How Vulnerable Are AI Agents to Indirect Prompt Injections? Insights from a Large-Scale Public Competition

Mateusz Dziemian, Maxwell Lin, Xiaohan Fu et al. · eth-zurich

LLM based agents are increasingly deployed in high stakes settings where they process external data sources such as emails, documents, and code repositories. This creates exposure to indirect prompt injection attacks, where adversarial instructions embedded in external content manipulate agent behavior without user awareness. A critical but underexplored dimension of this threat is concealment: since users tend to observe only an agent's final response, an attack can conceal its existence by presenting no clue of compromise in the final user facing response while successfully executing harmful actions. This leaves users unaware of the manipulation and likely to accept harmful outcomes as legitimate. We present findings from a large scale public red teaming competition evaluating this dual objective across three agent settings: tool calling, coding, and computer use. The competition attracted 464 participants who submitted 272000 attack attempts against 13 frontier models, yielding 8648 successful attacks across 41 scenarios. All models proved vulnerable, with attack success rates ranging from 0.5% (Claude Opus 4.5) to 8.5% (Gemini 2.5 Pro). We identify universal attack strategies that transfer across 21 of 41 behaviors and multiple model families, suggesting fundamental weaknesses in instruction following architectures. Capability and robustness showed weak correlation, with Gemini 2.5 Pro exhibiting both high capability and high vulnerability. To address benchmark saturation and obsoleteness, we will endeavor to deliver quarterly updates through continued red teaming competitions. We open source the competition environment for use in evaluations, along with 95 successful attacks against Qwen that did not transfer to any closed source model. We share model-specific attack data with respective frontier labs and the full dataset with the UK AISI and US CAISI to support robustness research.

CRSep 5, 2017Code
SafeKeeper: Protecting Web Passwords using Trusted Execution Environments

Klaudia Krawiecka, Arseny Kurnikov, Andrew Paverd et al.

Passwords are undoubtedly the most dominant user authentication mechanism on the web today. Although they are inexpensive and easy-to-use, security concerns of password-based authentication are serious. Phishing and theft of password databases are two critical concerns. The tendency of users to re-use passwords across different services exacerbates the impact of these two concerns. Current solutions addressing these concerns are not fully satisfactory: they typically address only one of the two concerns; they do not protect passwords from rogue servers; they do not provide any verifiable evidence of their (server-side) adoption to users; and they face deployability challenges in terms of the cost for service providers and/or ease-of-use for end users. We present SafeKeeper, a comprehensive approach to protect the confidentiality of passwords in web authentication systems. Unlike previous approaches, SafeKeeper protects user passwords against very strong adversaries, including rogue servers and sophisticated external phishers. It is relatively inexpensive to deploy as it (i) uses widely available hardware security mechanisms like Intel SGX, (ii) is integrated into popular web platforms like WordPress, and (iii) has small performance overhead. We describe a variety of challenges in designing and implementing such a system, and how we overcome them. Through an 86-participant user study, and systematic analysis and experiments, we demonstrate the usability, security and deployability of SafeKeeper, which is available as open-source.

CRSep 10, 2025
Architecting Resilient LLM Agents: A Guide to Secure Plan-then-Execute Implementations

Ron F. Del Rosario, Klaudia Krawiecka, Christian Schroeder de Witt

As Large Language Model (LLM) agents become increasingly capable of automating complex, multi-step tasks, the need for robust, secure, and predictable architectural patterns is paramount. This paper provides a comprehensive guide to the ``Plan-then-Execute'' (P-t-E) pattern, an agentic design that separates strategic planning from tactical execution. We explore the foundational principles of P-t-E, detailing its core components - the Planner and the Executor - and its architectural advantages in predictability, cost-efficiency, and reasoning quality over reactive patterns like ReAct (Reason + Act). A central focus is placed on the security implications of this design, particularly its inherent resilience to indirect prompt injection attacks by establishing control-flow integrity. We argue that while P-t-E provides a strong foundation, a defense-in-depth strategy is necessary, and we detail essential complementary controls such as the Principle of Least Privilege, task-scoped tool access, and sandboxed code execution. To make these principles actionable, this guide provides detailed implementation blueprints and working code references for three leading agentic frameworks: LangChain (via LangGraph), CrewAI, and AutoGen. Each framework's approach to implementing the P-t-E pattern is analyzed, highlighting unique features like LangGraph's stateful graphs for re-planning, CrewAI's declarative tool scoping for security, and AutoGen's built-in Docker sandboxing. Finally, we discuss advanced patterns, including dynamic re-planning loops, parallel execution with Directed Acyclic Graphs (DAGs), and the critical role of Human-in-the-Loop (HITL) verification, to offer a complete strategic blueprint for architects, developers, and security engineers aiming to build production-grade, resilient, and trustworthy LLM agents.

CRApr 7
Who Governs the Machine? A Machine Identity Governance Taxonomy (MIGT) for AI Systems Operating Across Enterprise and Geopolitical Boundaries

Andrew Kurtz, Klaudia Krawiecka

The governance of artificial intelligence has a blind spot: the machine identities that AI systems use to act. AI agents, service accounts, API tokens, and automated workflows now outnumber human identities in enterprise environments by ratios exceeding 80 to 1, yet no integrated framework exists to govern them. A single ungoverned automated agent produced $5.4-10 billion in losses in the 2024 CrowdStrike outage; nation-state actors including Silk Typhoon and Salt Typhoon have operationalized ungoverned machine credentials as primary espionage vectors against critical infrastructure. This paper makes four original contributions. First, the AI-Identity Risk Taxonomy (AIRT): a comprehensive enumeration of 37 risk sub-categories across eight domains, each grounded in documented incidents, regulatory recognition, practitioner prevalence data, and threat intelligence. Second, the Machine Identity Governance Taxonomy (MIGT): an integrated six-domain governance framework simultaneously addressing the technical governance gap, the regulatory compliance gap, and the cross-jurisdictional coordination gap that existing frameworks address only in isolation. Third, a foreign state actor threat model for enterprise identity governance, establishing that Silk Typhoon, Salt Typhoon, Volt Typhoon, and North Korean AI-enhanced identity fraud operations have already operationalized AI identity vulnerabilities as active attack vectors. Fourth, a cross-jurisdictional regulatory alignment structure mapping enterprise AI identity governance obligations under EU, US, and Chinese frameworks simultaneously, identifying irreconcilable conflicts and providing a governance mechanism for managing them. A four-phase implementation roadmap translates the MIGT into actionable enterprise programs.

CRFeb 8, 2022
BeeHIVE: Behavioral Biometric System based on Object Interactions in Smart Environments

Klaudia Krawiecka, Simon Birnbach, Simon Eberz et al.

The lack of standard input interfaces in the Internet of Things (IoT) ecosystems presents a challenge in securing such infrastructures. To tackle this challenge, we introduce a novel behavioral biometric system based on naturally occurring interactions with objects in smart environments. This biometric leverages existing sensors to authenticate users without requiring any hardware modifications of existing smart home devices. The system is designed to reduce the need for phone-based authentication mechanisms, on which smart home systems currently rely. It requires the user to approve transactions on their phone only when the user cannot be authenticated with high confidence through their interactions with the smart environment. We conduct a real-world experiment that involves 13 participants in a company environment, using this experiment to also study mimicry attacks on our proposed system. We show that this system can provide seamless and unobtrusive authentication while still staying highly resistant to zero-effort, video, and in-person observation-based mimicry attacks. Even when at most 1% of the strongest type of mimicry attacks are successful, our system does not require the user to take out their phone to approve legitimate transactions in more than 80% of cases for a single interaction. This increases to 92% of transactions when interactions with more objects are considered.