LGNov 19, 2015

Policy Distillation

arXiv:1511.06295v2858 citations
Originality Highly original
AI Analysis

This addresses the need for more efficient and compact policies in reinforcement learning, particularly for visual tasks, with incremental improvements in multi-task performance.

The paper tackles the problem of large, task-specific networks and extensive training in deep reinforcement learning by introducing policy distillation, a method to extract and train smaller, efficient networks that perform at expert levels, and demonstrates that a multi-task distilled agent outperforms single-task teachers and jointly-trained DQN agents in the Atari domain.

Policies for complex visual tasks have been successfully learned with deep reinforcement learning, using an approach called deep Q-networks (DQN), but relatively large (task-specific) networks and extensive training are needed to achieve good performance. In this work, we present a novel method called policy distillation that can be used to extract the policy of a reinforcement learning agent and train a new network that performs at the expert level while being dramatically smaller and more efficient. Furthermore, the same method can be used to consolidate multiple task-specific policies into a single policy. We demonstrate these claims using the Atari domain and show that the multi-task distilled agent outperforms the single-task teachers as well as a jointly-trained DQN agent.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes