CLOct 15, 2024

Eliciting Textual Descriptions from Representations of Continuous Prompts

Dana Ramati, Daniela Gottesman, Mor Geva

DeepMind

arXiv:2410.11660v13.44 citationsh-index: 32Has Code

Originality Incremental advance

AI Analysis

This provides an interpretability solution for developers to debug and mitigate unwanted properties in continuous prompts, addressing a specific bottleneck in parameter-efficient tuning.

The paper tackled the problem of interpreting opaque continuous prompts in large language models by proposing InSPEcT, a method that elicits textual descriptions during inference, showing it yields accurate task descriptions that become more faithful as performance increases and reveals biased features correlating with biased predictions.

Continuous prompts, or "soft prompts", are a widely-adopted parameter-efficient tuning strategy for large language models, but are often less favorable due to their opaque nature. Prior attempts to interpret continuous prompts relied on projecting individual prompt tokens onto the vocabulary space. However, this approach is problematic as performant prompts can yield arbitrary or contradictory text, and it interprets prompt tokens individually. In this work, we propose a new approach to interpret continuous prompts that elicits textual descriptions from their representations during model inference. Using a Patchscopes variant (Ghandeharioun et al., 2024) called InSPEcT over various tasks, we show our method often yields accurate task descriptions which become more faithful as task performance increases. Moreover, an elaborated version of InSPEcT reveals biased features in continuous prompts, whose presence correlates with biased model predictions. Providing an effective interpretability solution, InSPEcT can be leveraged to debug unwanted properties in continuous prompts and inform developers on ways to mitigate them.

View on arXiv PDF Code

Similar