LGMLJun 10, 2020

Continuous Action Reinforcement Learning from a Mixture of Interpretable Experts

arXiv:2006.05911v333 citations
Originality Incremental advance
AI Analysis

This addresses the need for interpretable policies in real-world RL applications, though it is incremental as it builds on existing RL methods with a focus on transparency.

The paper tackles the problem of deploying reinforcement learning in the real world by making learned policies more transparent, proposing a policy iteration scheme that uses a mixture of interpretable experts to achieve performance matching neural network policies on continuous action benchmarks.

Reinforcement learning (RL) has demonstrated its ability to solve high dimensional tasks by leveraging non-linear function approximators. However, these successes are mostly achieved by 'black-box' policies in simulated domains. When deploying RL to the real world, several concerns regarding the use of a 'black-box' policy might be raised. In order to make the learned policies more transparent, we propose in this paper a policy iteration scheme that retains a complex function approximator for its internal value predictions but constrains the policy to have a concise, hierarchical, and human-readable structure, based on a mixture of interpretable experts. Each expert selects a primitive action according to a distance to a prototypical state. A key design decision to keep such experts interpretable is to select the prototypical states from trajectory data. The main technical contribution of the paper is to address the challenges introduced by this non-differentiable prototypical state selection procedure. Experimentally, we show that our proposed algorithm can learn compelling policies on continuous action deep RL benchmarks, matching the performance of neural network based policies, but returning policies that are more amenable to human inspection than neural network or linear-in-feature policies.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes