LGCLCRCVJun 1, 2022

Discovering the Hidden Vocabulary of DALLE-2

arXiv:2206.00169v176 citationsh-index: 71
Originality Incremental advance
AI Analysis

This work highlights a security vulnerability in large-scale generative models, posing risks for misuse and interpretability challenges, though it is incremental in exposing specific model quirks.

The researchers discovered that DALLE-2 has a hidden vocabulary of seemingly random text tokens that consistently generate specific visual concepts like birds or bugs, revealing potential security and interpretability issues.

We discover that DALLE-2 seems to have a hidden vocabulary that can be used to generate images with absurd prompts. For example, it seems that \texttt{Apoploe vesrreaitais} means birds and \texttt{Contarra ccetnxniams luryca tanniounons} (sometimes) means bugs or pests. We find that these prompts are often consistent in isolation but also sometimes in combinations. We present our black-box method to discover words that seem random but have some correspondence to visual concepts. This creates important security and interpretability challenges.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes