ARCTIC: A Dataset for Dexterous Bimanual Hand-Object Manipulation
This addresses the problem of enabling machines to understand dexterous bimanual hand-object manipulation for robotics and computer vision researchers, though it is incremental as it builds on existing dataset efforts.
The authors tackled the lack of datasets with ground-truth 3D annotations for studying physically consistent hand-object manipulation by introducing ARCTIC, a dataset containing 2.1M video frames with accurate 3D meshes and contact information for bimanual interactions, and proposed two novel tasks with baseline models evaluated on it.
Humans intuitively understand that inanimate objects do not move by themselves, but that state changes are typically caused by human manipulation (e.g., the opening of a book). This is not yet the case for machines. In part this is because there exist no datasets with ground-truth 3D annotations for the study of physically consistent and synchronised motion of hands and articulated objects. To this end, we introduce ARCTIC -- a dataset of two hands that dexterously manipulate objects, containing 2.1M video frames paired with accurate 3D hand and object meshes and detailed, dynamic contact information. It contains bi-manual articulation of objects such as scissors or laptops, where hand poses and object states evolve jointly in time. We propose two novel articulated hand-object interaction tasks: (1) Consistent motion reconstruction: Given a monocular video, the goal is to reconstruct two hands and articulated objects in 3D, so that their motions are spatio-temporally consistent. (2) Interaction field estimation: Dense relative hand-object distances must be estimated from images. We introduce two baselines ArcticNet and InterField, respectively and evaluate them qualitatively and quantitatively on ARCTIC. Our code and data are available at https://arctic.is.tue.mpg.de.