How Persuasive is Your Context?
This addresses the need for better evaluation metrics in NLP for understanding model behavior, though it is incremental as it builds on existing work on persuasiveness.
The paper tackles the problem of quantifying how persuasive a given context is in altering a language model's answer distribution, introducing the targeted persuasion score (TPS) based on Wasserstein distance, and shows empirically that TPS captures a more nuanced notion of persuasiveness than previous metrics.
Two central capabilities of language models (LMs) are: (i) drawing on prior knowledge about entities, which allows them to answer queries such as "What's the official language of Austria?", and (ii) adapting to new information provided in context, e.g., "Pretend the official language of Austria is Tagalog.", that is pre-pended to the question. In this article, we introduce targeted persuasion score (TPS), designed to quantify how persuasive a given context is to an LM where persuasion is operationalized as the ability of the context to alter the LM's answer to the question. In contrast to evaluating persuasiveness only by inspecting the greedily decoded answer under the model, TPS provides a more fine-grained view of model behavior. Based on the Wasserstein distance, TPS measures how much a context shifts a model's original answer distribution toward a target distribution. Empirically, through a series of experiments, we show that TPS captures a more nuanced notion of persuasiveness than previously proposed metrics.