CLAILGMEFeb 26, 2025

A Causal Lens for Evaluating Faithfulness Metrics

arXiv:2502.18848v212 citationsh-index: 6EMNLP
Originality Incremental advance
AI Analysis

This work addresses the challenge of principled comparison for faithfulness metrics in model interpretability, which is crucial for understanding LLM decision-making, though it is incremental as it builds on existing evaluation methods.

The authors tackled the problem of evaluating faithfulness metrics for natural language explanations in LLMs by introducing the Causal Diagnosticity framework, which uses model-editing to generate faithful-unfaithful explanation pairs across four tasks, finding that Filler Tokens performed best overall with continuous metrics being more diagnostic but sensitive to noise.

Large Language Models (LLMs) offer natural language explanations as an alternative to feature attribution methods for model interpretability. However, despite their plausibility, they may not reflect the model's true reasoning faithfully, which is crucial for understanding the model's true decision-making processes. Although several faithfulness metrics have been proposed, they are often evaluated in isolation, making direct, principled comparisons between them difficult. Here, we present Causal Diagnosticity, a framework that serves as a common testbed to evaluate faithfulness metrics for natural language explanations. Our framework employs the concept of diagnosticity, and uses model-editing methods to generate faithful-unfaithful explanation pairs. Our benchmark includes four tasks: fact-checking, analogy, object counting, and multi-hop reasoning. We evaluate prominent faithfulness metrics, including post-hoc explanation and chain-of-thought-based methods. We find that diagnostic performance varies across tasks and models, with Filler Tokens performing best overall. Additionally, continuous metrics are generally more diagnostic than binary ones but can be sensitive to noise and model choice. Our results highlight the need for more robust faithfulness metrics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes