Explainable RL Policies by Distilling to Locally-Specialized Linear Policies with Voronoi State Partitioning
This addresses the need for transparent controllers in regulated or trust-sensitive applications, though it is incremental as it builds on existing knowledge distillation methods.
The paper tackled the problem of making deep reinforcement learning policies explainable by distilling them into locally-specialized linear models using Voronoi state partitioning, resulting in policies that are explainable and match or slightly outperform the original black-box policies in gridworld and classic control tasks.
Deep Reinforcement Learning is one of the state-of-the-art methods for producing near-optimal system controllers. However, deep RL algorithms train a deep neural network, that lacks transparency, which poses challenges when the controller has to meet regulations, or foster trust. To alleviate this, one could transfer the learned behaviour into a model that is human-readable by design using knowledge distilla- tion. Often this is done with a single model which mimics the original model on average but could struggle in more dynamic situations. A key challenge is that this simpler model should have the right balance be- tween flexibility and complexity or right balance between balance bias and accuracy. We propose a new model-agnostic method to divide the state space into regions where a simplified, human-understandable model can operate in. In this paper, we use Voronoi partitioning to find regions where linear models can achieve similar performance to the original con- troller. We evaluate our approach on a gridworld environment and a classic control task. We observe that our proposed distillation to locally- specialized linear models produces policies that are explainable and show that the distillation matches or even slightly outperforms the black-box policy they are distilled from.