LGSYJan 12, 2019

Learning Accurate Extended-Horizon Predictions of High Dimensional Trajectories

arXiv:1901.03895v1
Originality Incremental advance
AI Analysis

This addresses the challenge of sample efficiency in reinforcement learning for complex tasks like Mars landing simulations, representing an incremental improvement over existing predictive coding methods.

The paper tackles the problem of making accurate long-horizon predictions in high-dimensional trajectories by introducing a predictive model architecture based on predictive coding, which enables immediate accurate predictions from the first observation and achieves a 2X reduction in sample complexity for policy learning.

We present a novel predictive model architecture based on the principles of predictive coding that enables open loop prediction of future observations over extended horizons. There are two key innovations. First, whereas current methods typically learn to make long-horizon open-loop predictions using a multi-step cost function, we instead run the model open loop in the forward pass during training. Second, current predictive coding models initialize the representation layer's hidden state to a constant value at the start of an episode, and consequently typically require multiple steps of interaction with the environment before the model begins to produce accurate predictions. Instead, we learn a mapping from the first observation in an episode to the hidden state, allowing the trained model to immediately produce accurate predictions. We compare the performance of our architecture to a standard predictive coding model and demonstrate the ability of the model to make accurate long horizon open-loop predictions of simulated Doppler radar altimeter readings during a six degree of freedom Mars landing. Finally, we demonstrate a 2X reduction in sample complexity by using the model to implement a Dyna style algorithm to accelerate policy learning with proximal policy optimization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes