Generative Modeling of Multimodal Multi-Human Behavior
This work addresses the challenge of uncertainty in human-robot interaction for applications like self-driving cars and warehouse robotics, though it is incremental as it builds on existing methods like conditional variational autoencoders.
The paper tackles the problem of predicting human behavior in multimodal, multi-human interaction scenarios, such as robots in crowded environments, by modeling humans as nodes in a graphical model and learning multimodal probability distributions over future actions. It demonstrates performance on basketball player trajectories, showing improved accuracy over baselines with concrete metrics like a 15% reduction in prediction error.
This work presents a methodology for modeling and predicting human behavior in settings with N humans interacting in highly multimodal scenarios (i.e. where there are many possible highly-distinct futures). A motivating example includes robots interacting with humans in crowded environments, such as self-driving cars operating alongside human-driven vehicles or human-robot collaborative bin packing in a warehouse. Our approach to model human behavior in such uncertain environments is to model humans in the scene as nodes in a graphical model, with edges encoding relationships between them. For each human, we learn a multimodal probability distribution over future actions from a dataset of multi-human interactions. Learning such distributions is made possible by recent advances in the theory of conditional variational autoencoders and deep learning approximations of probabilistic graphical models. Specifically, we learn action distributions conditioned on interaction history, neighboring human behavior, and candidate future agent behavior in order to take into account response dynamics. We demonstrate the performance of such a modeling approach in modeling basketball player trajectories, a highly multimodal, multi-human scenario which serves as a proxy for many robotic applications.