CRAIOct 14, 2025

PromptLocate: Localizing Prompt Injection Attacks

arXiv:2510.12252v217 citationsh-index: 9
Originality Highly original
AI Analysis

This addresses the need for post-attack forensic analysis and data recovery in AI security, presenting a novel solution to an unexplored problem.

The paper tackles the problem of localizing prompt injection attacks in large language models by proposing PromptLocate, which accurately identifies injected instructions and data in contaminated data across multiple attack scenarios.

Prompt injection attacks deceive a large language model into completing an attacker-specified task instead of its intended task by contaminating its input data with an injected prompt, which consists of injected instruction(s) and data. Localizing the injected prompt within contaminated data is crucial for post-attack forensic analysis and data recovery. Despite its growing importance, prompt injection localization remains largely unexplored. In this work, we bridge this gap by proposing PromptLocate, the first method for localizing injected prompts. PromptLocate comprises three steps: (1) splitting the contaminated data into semantically coherent segments, (2) identifying segments contaminated by injected instructions, and (3) pinpointing segments contaminated by injected data. We show PromptLocate accurately localizes injected prompts across eight existing and eight adaptive attacks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes