LG CL CR CVJun 1, 2022

Discovering the Hidden Vocabulary of DALLE-2

arXiv:2206.00169v124.977 citationsh-index: 71

Originality Incremental advance

AI Analysis

This work highlights a security vulnerability in large-scale generative models, posing risks for misuse and interpretability challenges, though it is incremental in exposing specific model quirks.

The researchers discovered that DALLE-2 has a hidden vocabulary of seemingly random text tokens that consistently generate specific visual concepts like birds or bugs, revealing potential security and interpretability issues.

We discover that DALLE-2 seems to have a hidden vocabulary that can be used to generate images with absurd prompts. For example, it seems that \texttt{Apoploe vesrreaitais} means birds and \texttt{Contarra ccetnxniams luryca tanniounons} (sometimes) means bugs or pests. We find that these prompts are often consistent in isolation but also sometimes in combinations. We present our black-box method to discover words that seem random but have some correspondence to visual concepts. This creates important security and interpretability challenges.

View on arXiv PDF

Similar