CLMay 19

Lost in Interpretation: The Plausibility-Faithfulness Trade-off in Cross-Lingual Explanations

Somnath Banerjee, Pranav Jha, Rima Hazra, Animesh Mukherjee

arXiv:2605.1927487.1

Predicted impact top 44% in CL · last 90 daysOriginality Synthesis-oriented

AI Analysis

For practitioners auditing multilingual LLMs, this work highlights that English explanations can be misleading, recommending native-language auditing and multi-faceted faithfulness metrics.

The paper identifies a systematic trade-off in cross-lingual explanations: English-pivot explanations achieve higher span agreement with human rationales but are less causally grounded, with comprehensiveness degrading by up to 5.7x compared to native-language explanations, even as task accuracy remains stable.

LLMs deployed multilingually are often audited via English explanations for non-English inputs. We evaluate extractive explanations ''where the model identifies input token spans as evidence alongside a generated rationale'' and uncover a systematic trade-off: English-pivot explanations can achieve higher span agreement with human rationales while their evidence becomes less causally grounded in the model's prediction, as measured by both comprehensiveness and sufficiency. Across 3 tasks, 5~languages, and 2~multilingual LLM families, we find that English explanations frequently produce fluent but loosely anchored rationales, with comprehensiveness degrading by up to 5.7x relative to native-language conditions - even as task accuracy remains stable across settings. For socially nuanced classification, English pivots also fail to preserve pragmatic cues, reducing both faithfulness and span agreement. We recommend auditing explanations in the input language, reporting multi-faceted faithfulness metrics beyond lexical overlap, and treating English rationales as communication summaries rather than faithful decision traces.

View on arXiv PDF

Similar