AISep 10, 2020

TripleTree: A Versatile Interpretable Representation of Black Box Agents and their Environments

arXiv:2009.04743v234 citations
AI Analysis

This addresses the need for trust and validation in explainable AI for users of autonomous agents, though it is an incremental step as it builds on existing behaviorist and conceptual spaces approaches.

The paper tackles the problem of interpreting black-box autonomous agents by discretizing the state space into convex regions to capture action, value, and temporal similarities, resulting in a representation that enables prediction, visualization, and rule-based explanation.

In explainable artificial intelligence, there is increasing interest in understanding the behaviour of autonomous agents to build trust and validate performance. Modern agent architectures, such as those trained by deep reinforcement learning, are currently so lacking in interpretable structure as to effectively be black boxes, but insights may still be gained from an external, behaviourist perspective. Inspired by conceptual spaces theory, we suggest that a versatile first step towards general understanding is to discretise the state space into convex regions, jointly capturing similarities over the agent's action, value function and temporal dynamics within a dataset of observations. We create such a representation using a novel variant of the CART decision tree algorithm, and demonstrate how it facilitates practical understanding of black box agents through prediction, visualisation and rule-based explanation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes