CL AIApr 4, 2024

The Probabilities Also Matter: A More Faithful Metric for Faithfulness of Free-Text Explanations in Large Language Models

Noah Y. Siegel, Oana-Maria Camburu, Nicolas Heess, Maria Perez-Ortiz

arXiv:2404.03189v219.838 citationsh-index: 72ACL

Originality Incremental advance

AI Analysis

This work addresses the challenge of ensuring AI transparency for oversight by providing a more accurate metric to assess explanation faithfulness, though it is incremental as it builds on existing tests.

The authors tackled the problem of evaluating the faithfulness of free-text explanations from large language models by introducing a new metric, Correlational Explanatory Faithfulness (CEF), which accounts for shifts in predicted label distributions rather than just binary changes, and found it captures aspects missed by previous methods in tests on Llama2 models across three NLP tasks.

In order to oversee advanced AI systems, it is important to understand their underlying decision-making process. When prompted, large language models (LLMs) can provide natural language explanations or reasoning traces that sound plausible and receive high ratings from human annotators. However, it is unclear to what extent these explanations are faithful, i.e., truly capture the factors responsible for the model's predictions. In this work, we introduce Correlational Explanatory Faithfulness (CEF), a metric that can be used in faithfulness tests based on input interventions. Previous metrics used in such tests take into account only binary changes in the predictions. Our metric accounts for the total shift in the model's predicted label distribution, more accurately reflecting the explanations' faithfulness. We then introduce the Correlational Counterfactual Test (CCT) by instantiating CEF on the Counterfactual Test (CT) from Atanasova et al. (2023). We evaluate the faithfulness of free-text explanations generated by few-shot-prompted LLMs from the Llama2 family on three NLP tasks. We find that our metric measures aspects of faithfulness which the CT misses.

View on arXiv PDF

Similar