LGAIMLOct 14, 2019

On the Expressivity of Neural Networks for Deep Reinforcement Learning

arXiv:1910.05927v319 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of policy approximation in reinforcement learning for researchers and practitioners, offering a method to enhance performance in complex MDPs, though it is incremental as it builds on existing planning techniques.

The paper compares model-free and model-based reinforcement learning by analyzing the expressive power of neural networks for policies, Q-functions, and dynamics, showing that optimal Q-functions and policies can be more complex than dynamics in many MDPs, even in one-dimensional continuous state spaces. It introduces a multi-step model-based bootstrapping planner (BOOTS) that improves performance on MuJoCo benchmark tasks when applied to model-based or model-free algorithms at test time.

We compare the model-free reinforcement learning with the model-based approaches through the lens of the expressive power of neural networks for policies, $Q$-functions, and dynamics. We show, theoretically and empirically, that even for one-dimensional continuous state space, there are many MDPs whose optimal $Q$-functions and policies are much more complex than the dynamics. We hypothesize many real-world MDPs also have a similar property. For these MDPs, model-based planning is a favorable algorithm, because the resulting policies can approximate the optimal policy significantly better than a neural network parameterization can, and model-free or model-based policy optimization rely on policy parameterization. Motivated by the theory, we apply a simple multi-step model-based bootstrapping planner (BOOTS) to bootstrap a weak $Q$-function into a stronger policy. Empirical results show that applying BOOTS on top of model-based or model-free policy optimization algorithms at the test time improves the performance on MuJoCo benchmark tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes