CLOct 15, 2024

Eliciting Textual Descriptions from Representations of Continuous Prompts

DeepMind
arXiv:2410.11660v13 citationsh-index: 32
Originality Incremental advance
AI Analysis

This provides an interpretability solution for developers to debug and mitigate unwanted properties in continuous prompts, addressing a specific bottleneck in parameter-efficient tuning.

The paper tackled the problem of interpreting opaque continuous prompts in large language models by proposing InSPEcT, a method that elicits textual descriptions during inference, showing it yields accurate task descriptions that become more faithful as performance increases and reveals biased features correlating with biased predictions.

Continuous prompts, or "soft prompts", are a widely-adopted parameter-efficient tuning strategy for large language models, but are often less favorable due to their opaque nature. Prior attempts to interpret continuous prompts relied on projecting individual prompt tokens onto the vocabulary space. However, this approach is problematic as performant prompts can yield arbitrary or contradictory text, and it interprets prompt tokens individually. In this work, we propose a new approach to interpret continuous prompts that elicits textual descriptions from their representations during model inference. Using a Patchscopes variant (Ghandeharioun et al., 2024) called InSPEcT over various tasks, we show our method often yields accurate task descriptions which become more faithful as task performance increases. Moreover, an elaborated version of InSPEcT reveals biased features in continuous prompts, whose presence correlates with biased model predictions. Providing an effective interpretability solution, InSPEcT can be leveraged to debug unwanted properties in continuous prompts and inform developers on ways to mitigate them.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes