Motion Capture from Inertial and Vision Sensors
This work addresses the need for consumer-friendly motion capture solutions, offering a valuable dataset and method for researchers and developers in computer vision and graphics.
The paper tackles the problem of affordable human motion capture for personal applications by introducing MINIONS, a large-scale dataset with over five million frames and 400 minutes of multi-modal data from inertial and vision sensors, and proposes a SparseNet framework that leverages these sensors to achieve accurate motion capture.
Human motion capture is the foundation for many computer vision and graphics tasks. While industrial motion capture systems with complex camera arrays or expensive wearable sensors have been widely adopted in movie and game production, consumer-affordable and easy-to-use solutions for personal applications are still far from mature. To utilize a mixture of a monocular camera and very few inertial measurement units (IMUs) for accurate multi-modal human motion capture in daily life, we contribute MINIONS in this paper, a large-scale Motion capture dataset collected from INertial and visION Sensors. MINIONS has several featured properties: 1) large scale of over five million frames and 400 minutes duration; 2) multi-modality data of IMUs signals and RGB videos labeled with joint positions, joint rotations, SMPL parameters, etc.; 3) a diverse set of 146 fine-grained single and interactive actions with textual descriptions. With the proposed MINIONS dataset, we propose a SparseNet framework to capture human motion from IMUs and videos by discovering their supplementary features and exploring the possibilities of consumer-affordable motion capture using a monocular camera and very few IMUs. The experiment results emphasize the unique advantages of inertial and vision sensors, showcasing the promise of consumer-affordable multi-modal motion capture and providing a valuable resource for further research and development.