CLLGNov 22, 2023

Language Model Inversion

arXiv:2311.13647v172 citationsh-index: 69Has Code
Originality Highly original
AI Analysis

This addresses a security and privacy issue for users of language models by exposing vulnerabilities in prompt confidentiality.

The paper tackles the problem of recovering hidden prompt tokens from a language model's next-token probability distribution, showing that this information can reconstruct prompts with a BLEU score of 59, token-level F1 of 78, and exact recovery of 27% of prompts on Llama-2 7b.

Language models produce a distribution over the next token; can we use this information to recover the prompt tokens? We consider the problem of language model inversion and show that next-token probabilities contain a surprising amount of information about the preceding text. Often we can recover the text in cases where it is hidden from the user, motivating a method for recovering unknown prompts given only the model's current distribution output. We consider a variety of model access scenarios, and show how even without predictions for every token in the vocabulary we can recover the probability vector through search. On Llama-2 7b, our inversion method reconstructs prompts with a BLEU of $59$ and token-level F1 of $78$ and recovers $27\%$ of prompts exactly. Code for reproducing all experiments is available at http://github.com/jxmorris12/vec2text.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes