Distilling Deep RL Models Into Interpretable Neuro-Fuzzy Systems
This addresses the interpretability issue in deep RL for practitioners needing transparent decision-making, though it is incremental as it builds on existing distillation and neuro-fuzzy methods.
The paper tackles the problem of deep reinforcement learning models being black boxes by distilling a deep Q-network policy into a compact neuro-fuzzy controller, achieving nearly matching performance on three OpenAI Gym environments using only 2 to 6 fuzzy rules.
Deep Reinforcement Learning uses a deep neural network to encode a policy, which achieves very good performance in a wide range of applications but is widely regarded as a black box model. A more interpretable alternative to deep networks is given by neuro-fuzzy controllers. Unfortunately, neuro-fuzzy controllers often need a large number of rules to solve relatively simple tasks, making them difficult to interpret. In this work, we present an algorithm to distill the policy from a deep Q-network into a compact neuro-fuzzy controller. This allows us to train compact neuro-fuzzy controllers through distillation to solve tasks that they are unable to solve directly, combining the flexibility of deep reinforcement learning and the interpretability of compact rule bases. We demonstrate the algorithm on three well-known environments from OpenAI Gym, where we nearly match the performance of a DQN agent using only 2 to 6 fuzzy rules.