LGFeb 8, 2023

Black Box Adversarial Prompting for Foundation Models

arXiv:2302.04237v281 citationsh-index: 16
AI Analysis

This addresses security and reliability issues for users of foundation models, but it is incremental as it builds on existing adversarial techniques.

The paper tackles the problem of small prompt changes causing significant output variations in generative models by developing a black-box framework for generating adversarial prompts that induce specific behaviors, such as generating particular objects or high perplexity text, in image and text generation.

Prompting interfaces allow users to quickly adjust the output of generative models in both vision and language. However, small changes and design choices in the prompt can lead to significant differences in the output. In this work, we develop a black-box framework for generating adversarial prompts for unstructured image and text generation. These prompts, which can be standalone or prepended to benign prompts, induce specific behaviors into the generative process, such as generating images of a particular object or generating high perplexity text.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes