HCAIApr 16, 2025

Don't Just Translate, Agitate: Using Large Language Models as Devil's Advocates for AI Explanations

arXiv:2504.12424v12 citationsh-index: 4
Originality Synthesis-oriented
AI Analysis

This addresses the problem of misleading interpretability in Explainable AI for users, but it is a position paper, so it is incremental in proposing a new role rather than presenting empirical results.

The paper critiques the use of Large Language Models (LLMs) to translate AI explanations into natural language, arguing it can lead to user overreliance, and proposes instead using LLMs as devil's advocates to interrogate explanations and highlight limitations.

This position paper highlights a growing trend in Explainable AI (XAI) research where Large Language Models (LLMs) are used to translate outputs from explainability techniques, like feature-attribution weights, into a natural language explanation. While this approach may improve accessibility or readability for users, recent findings suggest that translating into human-like explanations does not necessarily enhance user understanding and may instead lead to overreliance on AI systems. When LLMs summarize XAI outputs without surfacing model limitations, uncertainties, or inconsistencies, they risk reinforcing the illusion of interpretability rather than fostering meaningful transparency. We argue that - instead of merely translating XAI outputs - LLMs should serve as constructive agitators, or devil's advocates, whose role is to actively interrogate AI explanations by presenting alternative interpretations, potential biases, training data limitations, and cases where the model's reasoning may break down. In this role, LLMs can facilitate users in engaging critically with AI systems and generated explanations, with the potential to reduce overreliance caused by misinterpreted or specious explanations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes