BEHAVE: Dataset and Method for Tracking Human Object Interactions
This addresses a critical gap for applications like gaming, VR, and robotics by providing the first comprehensive dataset and tracking method for human-object interactions, though it is incremental in building upon existing statistical body models.
The authors tackled the lack of a dataset for full-body human-object interactions in natural environments by introducing the BEHAVE dataset, which includes around 15k frames with multi-view RGBD data and 3D annotations, and developed a method to jointly track humans and objects using a portable camera setup.
Modelling interactions between humans and objects in natural environments is central to many applications including gaming, virtual and mixed reality, as well as human behavior analysis and human-robot collaboration. This challenging operation scenario requires generalization to vast number of objects, scenes, and human actions. Unfortunately, there exist no such dataset. Moreover, this data needs to be acquired in diverse natural environments, which rules out 4D scanners and marker based capture systems. We present BEHAVE dataset, the first full body human- object interaction dataset with multi-view RGBD frames and corresponding 3D SMPL and object fits along with the annotated contacts between them. We record around 15k frames at 5 locations with 8 subjects performing a wide range of interactions with 20 common objects. We use this data to learn a model that can jointly track humans and objects in natural environments with an easy-to-use portable multi-camera setup. Our key insight is to predict correspondences from the human and the object to a statistical body model to obtain human-object contacts during interactions. Our approach can record and track not just the humans and objects but also their interactions, modeled as surface contacts, in 3D. Our code and data can be found at: http://virtualhumans.mpi-inf.mpg.de/behave