CLAIApr 15, 2022

Understanding Game-Playing Agents with Natural Language Annotations

Berkeley
arXiv:2204.07531v1639 citationsh-index: 85
Originality Incremental advance
AI Analysis

This provides a tool for model interpretability in AI game-playing, though it is incremental as it builds on existing methods like linear probing.

The authors tackled the problem of interpreting game-playing agents by creating a dataset of 10K human-annotated Go games and using linear probing to predict domain-specific terms from model representations, finding that concepts like ko and atari are nontrivially encoded in policy networks, with later layers best predicting these terms.

We present a new dataset containing 10K human-annotated games of Go and show how these natural language annotations can be used as a tool for model interpretability. Given a board state and its associated comment, our approach uses linear probing to predict mentions of domain-specific terms (e.g., ko, atari) from the intermediate state representations of game-playing agents like AlphaGo Zero. We find these game concepts are nontrivially encoded in two distinct policy networks, one trained via imitation learning and another trained via reinforcement learning. Furthermore, mentions of domain-specific terms are most easily predicted from the later layers of both models, suggesting that these policy networks encode high-level abstractions similar to those used in the natural language annotations.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes