MLLGROSYFeb 8, 2015

From Pixels to Torques: Policy Learning with Deep Dynamical Models

arXiv:1502.02251v3192 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of data-efficient learning in continuous state-action spaces with high-dimensional observations for autonomous systems, representing an incremental improvement over existing methods.

The paper tackles the pixels to torques problem, where an agent learns a closed-loop control policy from pixel observations, by introducing a data-efficient, model-based reinforcement learning algorithm that uses a deep dynamical model for joint learning of low-dimensional embeddings and predictive models, resulting in quick learning and scalability to high-dimensional state spaces.

Data-efficient learning in continuous state-action spaces using very high-dimensional observations remains a key challenge in developing fully autonomous systems. In this paper, we consider one instance of this challenge, the pixels to torques problem, where an agent must learn a closed-loop control policy from pixel information only. We introduce a data-efficient, model-based reinforcement learning algorithm that learns such a closed-loop policy directly from pixel information. The key ingredient is a deep dynamical model that uses deep auto-encoders to learn a low-dimensional embedding of images jointly with a predictive model in this low-dimensional feature space. Joint learning ensures that not only static but also dynamic properties of the data are accounted for. This is crucial for long-term predictions, which lie at the core of the adaptive model predictive control strategy that we use for closed-loop control. Compared to state-of-the-art reinforcement learning methods for continuous states and actions, our approach learns quickly, scales to high-dimensional state spaces and is an important step toward fully autonomous learning from pixels to torques.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes