LGAIOct 31, 2022

Disentangled (Un)Controllable Features

arXiv:2211.00086v22 citationsh-index: 31
Originality Incremental advance
AI Analysis

This work addresses interpretability issues in reinforcement learning for high-dimensional states, offering a domain-specific solution.

The paper tackled the problem of interpretability in compressed state representations for MDPs by introducing a method to disentangle latent features into controllable and uncontrollable partitions, demonstrating interpretable planning in procedurally generated maze environments.

In the context of MDPs with high-dimensional states, downstream tasks are predominantly applied on a compressed, low-dimensional representation of the original input space. A variety of learning objectives have therefore been used to attain useful representations. However, these representations usually lack interpretability of the different features. We present a novel approach that is able to disentangle latent features into a controllable and an uncontrollable partition. We illustrate that the resulting partitioned representations are easily interpretable on three types of environments and show that, in a distribution of procedurally generated maze environments, it is feasible to interpretably employ a planning algorithm in the isolated controllable latent partition.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes