Learning Human Motion Models for Long-term Predictions
This work addresses the challenge of long-term motion prediction for applications like animation and robotics, though it appears incremental as it builds on existing LSTM and autoencoder techniques.
The paper tackles the problem of generating realistic human motion sequences over long time horizons by proposing a Dropout Autoencoder LSTM architecture, which outperforms state-of-the-art methods on large motion-capture datasets and produces natural-looking sequences without catastrophic drift.
We propose a new architecture for the learning of predictive spatio-temporal motion models from data alone. Our approach, dubbed the Dropout Autoencoder LSTM, is capable of synthesizing natural looking motion sequences over long time horizons without catastrophic drift or motion degradation. The model consists of two components, a 3-layer recurrent neural network to model temporal aspects and a novel auto-encoder that is trained to implicitly recover the spatial structure of the human skeleton via randomly removing information about joints during training time. This Dropout Autoencoder (D-AE) is then used to filter each predicted pose of the LSTM, reducing accumulation of error and hence drift over time. Furthermore, we propose new evaluation protocols to assess the quality of synthetic motion sequences even for which no ground truth data exists. The proposed protocols can be used to assess generated sequences of arbitrary length. Finally, we evaluate our proposed method on two of the largest motion-capture datasets available to date and show that our model outperforms the state-of-the-art on a variety of actions, including cyclic and acyclic motion, and that it can produce natural looking sequences over longer time horizons than previous methods.