CRAICLSEJun 8, 2023

Prompt Injection attack against LLM-integrated Applications

arXiv:2306.05499v3765 citationsh-index: 40
Originality Incremental advance
AI Analysis

This work addresses critical security vulnerabilities in widely used LLM applications, highlighting severe risks like arbitrary LLM usage and prompt theft, though it is incremental as it builds on traditional web injection attacks.

The study tackled the security risks of prompt injection attacks on LLM-integrated applications by developing HouYi, a black-box attack technique, which revealed that 31 out of 36 tested applications were vulnerable, including Notion with potential impact on millions of users.

Large Language Models (LLMs), renowned for their superior proficiency in language comprehension and generation, stimulate a vibrant ecosystem of applications around them. However, their extensive assimilation into various services introduces significant security risks. This study deconstructs the complexities and implications of prompt injection attacks on actual LLM-integrated applications. Initially, we conduct an exploratory analysis on ten commercial applications, highlighting the constraints of current attack strategies in practice. Prompted by these limitations, we subsequently formulate HouYi, a novel black-box prompt injection attack technique, which draws inspiration from traditional web injection attacks. HouYi is compartmentalized into three crucial elements: a seamlessly-incorporated pre-constructed prompt, an injection prompt inducing context partition, and a malicious payload designed to fulfill the attack objectives. Leveraging HouYi, we unveil previously unknown and severe attack outcomes, such as unrestricted arbitrary LLM usage and uncomplicated application prompt theft. We deploy HouYi on 36 actual LLM-integrated applications and discern 31 applications susceptible to prompt injection. 10 vendors have validated our discoveries, including Notion, which has the potential to impact millions of users. Our investigation illuminates both the possible risks of prompt injection attacks and the possible tactics for mitigation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes