AICLLGNov 17, 2022

Explainability Via Causal Self-Talk

arXiv:2211.09937v110 citationsh-index: 23
Originality Incremental advance
AI Analysis

This addresses the explainability issue in AI for practitioners by offering a pragmatic solution that balances XAI ambitions with deep learning constraints, though it appears incremental as it builds on existing causal modeling ideas.

The paper tackles the problem of explaining AI system behavior by proposing a method that trains agents to build causal models of themselves, specifically for Deep RL agents, and demonstrates its effectiveness in generating faithful and semantically-meaningful explanations in a simulated 3D environment.

Explaining the behavior of AI systems is an important problem that, in practice, is generally avoided. While the XAI community has been developing an abundance of techniques, most incur a set of costs that the wider deep learning community has been unwilling to pay in most situations. We take a pragmatic view of the issue, and define a set of desiderata that capture both the ambitions of XAI and the practical constraints of deep learning. We describe an effective way to satisfy all the desiderata: train the AI system to build a causal model of itself. We develop an instance of this solution for Deep RL agents: Causal Self-Talk. CST operates by training the agent to communicate with itself across time. We implement this method in a simulated 3D environment, and show how it enables agents to generate faithful and semantically-meaningful explanations of their own behavior. Beyond explanations, we also demonstrate that these learned models provide new ways of building semantic control interfaces to AI systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes