LGCLMar 14, 2025

Reasoning-Grounded Natural Language Explanations for Language Models

arXiv:2503.11248v16 citationsh-index: 2xAI
Originality Incremental advance
AI Analysis

This addresses the need for more interpretable AI systems by providing a method to improve explanation faithfulness, though it appears incremental as it builds on existing reasoning techniques.

The paper tackles the problem of generating faithful natural language explanations for language models by grounding them in a reasoning process, achieving high alignment between answers and explanations in several domains.

We propose a large language model explainability technique for obtaining faithful natural language explanations by grounding the explanations in a reasoning process. When converted to a sequence of tokens, the outputs of the reasoning process can become part of the model context and later be decoded to natural language as the model produces either the final answer or the explanation. To improve the faithfulness of the explanations, we propose to use a joint predict-explain approach, in which the answers and explanations are inferred directly from the reasoning sequence, without the explanations being dependent on the answers and vice versa. We demonstrate the plausibility of the proposed technique by achieving a high alignment between answers and explanations in several problem domains, observing that language models often simply copy the partial decisions from the reasoning sequence into the final answers or explanations. Furthermore, we show that the proposed use of reasoning can also improve the quality of the answers.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes