CLAILGJun 17, 2024

CELL your Model: Contrastive Explanations for Large Language Models

arXiv:2406.11785v32 citations
Originality Incremental advance
AI Analysis

This addresses the need for interpretability in generative AI for users relying on LLMs, though it is incremental as it builds on existing contrastive explanation concepts.

The paper tackles the problem of explaining outputs from large language models (LLMs) by proposing a contrastive explanation method that identifies how slight prompt modifications would lead to less preferable or contradictory responses, demonstrating efficacy on tasks like open-text generation and chatbot conversations.

The advent of black-box deep neural network classification models has sparked the need to explain their decisions. However, in the case of generative AI, such as large language models (LLMs), there is no class prediction to explain. Rather, one can ask why an LLM output a particular response to a given prompt. In this paper, we answer this question by proposing a contrastive explanation method requiring simply black-box/query access. Our explanations suggest that an LLM outputs a reply to a given prompt because if the prompt was slightly modified, the LLM would have given a different response that is either less preferable or contradicts the original response. The key insight is that contrastive explanations simply require a scoring function that has meaning to the user and not necessarily a specific real valued quantity (viz. class label). To this end, we offer a novel budgeted algorithm, our main algorithmic contribution, which intelligently creates contrasts based on such a scoring function while adhering to a query budget, necessary for longer contexts. We show the efficacy of our method on important natural language tasks such as open-text generation and chatbot conversations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes