StruPhantom: Evolutionary Injection Attacks on Black-Box Tabular Agents Powered by Large Language Models
This addresses a security vulnerability in AI systems used for business applications, though it is incremental as it builds on known prompt injection weaknesses.
The paper tackles the challenge of attacking black-box LLM-powered tabular agents, which are resistant to standard prompt injection due to strict data formats, by introducing StruPhantom, an evolutionary attack method that achieves over 50% higher success rates than baselines in enforcing malicious responses like phishing links.
The proliferation of autonomous agents powered by large language models (LLMs) has revolutionized popular business applications dealing with tabular data, i.e., tabular agents. Although LLMs are observed to be vulnerable against prompt injection attacks from external data sources, tabular agents impose strict data formats and predefined rules on the attacker's payload, which are ineffective unless the agent navigates multiple layers of structural data to incorporate the payload. To address the challenge, we present a novel attack termed StruPhantom which specifically targets black-box LLM-powered tabular agents. Our attack designs an evolutionary optimization procedure which continually refines attack payloads via the proposed constrained Monte Carlo Tree Search augmented by an off-topic evaluator. StruPhantom helps systematically explore and exploit the weaknesses of target applications to achieve goal hijacking. Our evaluation validates the effectiveness of StruPhantom across various LLM-based agents, including those on real-world platforms, and attack scenarios. Our attack achieves over 50% higher success rates than baselines in enforcing the application's response to contain phishing links or malicious codes.