CLAIJul 25, 2023

Analyzing Chain-of-Thought Prompting in Large Language Models via Gradient-based Feature Attributions

arXiv:2307.13339v142 citationsh-index: 43Has Code
Originality Synthesis-oriented
AI Analysis

This provides incremental insights into model interpretability for researchers, aiding responsible deployment of LLMs.

The paper tackled the problem of understanding why chain-of-thought prompting improves LLM accuracy by using gradient-based feature attributions to analyze token importance, finding that it increases robustness of saliency scores to perturbations rather than boosting magnitude for relevant tokens.

Chain-of-thought (CoT) prompting has been shown to empirically improve the accuracy of large language models (LLMs) on various question answering tasks. While understanding why CoT prompting is effective is crucial to ensuring that this phenomenon is a consequence of desired model behavior, little work has addressed this; nonetheless, such an understanding is a critical prerequisite for responsible model deployment. We address this question by leveraging gradient-based feature attribution methods which produce saliency scores that capture the influence of input tokens on model output. Specifically, we probe several open-source LLMs to investigate whether CoT prompting affects the relative importances they assign to particular input tokens. Our results indicate that while CoT prompting does not increase the magnitude of saliency scores attributed to semantically relevant tokens in the prompt compared to standard few-shot prompting, it increases the robustness of saliency scores to question perturbations and variations in model output.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes