Multimodal Deep Generative Models for Trajectory Prediction: A Conditional Variational Autoencoder Approach
This is an incremental tutorial paper that helps researchers and practitioners in robotics understand and apply CVAE-based methods for predicting human behavior to improve robot safety and planning.
This tutorial paper reviews and organizes state-of-the-art methods for human behavior prediction in robotics, focusing on a conditional variational autoencoder (CVAE) approach that generates multimodal probability distributions over future human trajectories. It provides a rigorous yet accessible description of this data-driven method and highlights design considerations for use in model-based planning for human-robot interactions.
Human behavior prediction models enable robots to anticipate how humans may react to their actions, and hence are instrumental to devising safe and proactive robot planning algorithms. However, modeling complex interaction dynamics and capturing the possibility of many possible outcomes in such interactive settings is very challenging, which has recently prompted the study of several different approaches. In this work, we provide a self-contained tutorial on a conditional variational autoencoder (CVAE) approach to human behavior prediction which, at its core, can produce a multimodal probability distribution over future human trajectories conditioned on past interactions and candidate robot future actions. Specifically, the goals of this tutorial paper are to review and build a taxonomy of state-of-the-art methods in human behavior prediction, from physics-based to purely data-driven methods, provide a rigorous yet easily accessible description of a data-driven, CVAE-based approach, highlight important design characteristics that make this an attractive model to use in the context of model-based planning for human-robot interactions, and provide important design considerations when using this class of models.