CR AI CL LGOct 19, 2023

Formalizing and Benchmarking Prompt Injection Attacks and Defenses

Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, Neil Zhenqiang Gong

arXiv:2310.12815v545.9386 citationsh-index: 53Has Code

Originality Incremental advance

AI Analysis

This work addresses the security problem of prompt injection for developers and researchers in AI safety, offering a foundational benchmark but is incremental as it builds on existing case studies.

The authors tackled the lack of systematic understanding of prompt injection attacks and defenses in LLM-integrated applications by formalizing attacks into a framework and conducting a benchmark evaluation with 5 attacks, 10 defenses, 10 LLMs, and 7 tasks, providing a public platform for future research.

A prompt injection attack aims to inject malicious instruction/data into the input of an LLM-Integrated Application such that it produces results as an attacker desires. Existing works are limited to case studies. As a result, the literature lacks a systematic understanding of prompt injection attacks and their defenses. We aim to bridge the gap in this work. In particular, we propose a framework to formalize prompt injection attacks. Existing attacks are special cases in our framework. Moreover, based on our framework, we design a new attack by combining existing ones. Using our framework, we conduct a systematic evaluation on 5 prompt injection attacks and 10 defenses with 10 LLMs and 7 tasks. Our work provides a common benchmark for quantitatively evaluating future prompt injection attacks and defenses. To facilitate research on this topic, we make our platform public at https://github.com/liu00222/Open-Prompt-Injection.

View on arXiv PDF Code

Similar