LGROOct 9, 2020

Learning to Locomote: Understanding How Environment Design Matters for Deep Reinforcement Learning

arXiv:2010.04304v160 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of unreliable RL outcomes in continuous-action control for animation and robotics, emphasizing that environment design is a critical but often overlooked factor, making it incremental by focusing on design aspects rather than algorithmic breakthroughs.

The paper investigates how design choices in reinforcement learning environments, such as state representations and reward structures, significantly affect the performance and brittleness of learned locomotion policies. It demonstrates that these choices can lead to substantial variations in results, highlighting the need for careful environment design.

Learning to locomote is one of the most common tasks in physics-based animation and deep reinforcement learning (RL). A learned policy is the product of the problem to be solved, as embodied by the RL environment, and the RL algorithm. While enormous attention has been devoted to RL algorithms, much less is known about the impact of design choices for the RL environment. In this paper, we show that environment design matters in significant ways and document how it can contribute to the brittle nature of many RL results. Specifically, we examine choices related to state representations, initial state distributions, reward structure, control frequency, episode termination procedures, curriculum usage, the action space, and the torque limits. We aim to stimulate discussion around such choices, which in practice strongly impact the success of RL when applied to continuous-action control problems of interest to animation, such as learning to locomote.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes