Real-Time Human Motion Capture with Multiple Depth Cameras
This provides an efficient and inexpensive solution for real-time motion capture, benefiting applications like animation or gaming, though it is incremental by building on existing depth camera and segmentation techniques.
The paper tackles markerless human motion capture using multiple Kinect sensors, achieving state-of-the-art results on the Berkeley MHAD dataset by accurately localizing body parts in real-time without intrusive markers.
Commonly used human motion capture systems require intrusive attachment of markers that are visually tracked with multiple cameras. In this work we present an efficient and inexpensive solution to markerless motion capture using only a few Kinect sensors. Unlike the previous work on 3d pose estimation using a single depth camera, we relax constraints on the camera location and do not assume a co-operative user. We apply recent image segmentation techniques to depth images and use curriculum learning to train our system on purely synthetic data. Our method accurately localizes body parts without requiring an explicit shape model. The body joint locations are then recovered by combining evidence from multiple views in real-time. We also introduce a dataset of ~6 million synthetic depth frames for pose estimation from multiple cameras and exceed state-of-the-art results on the Berkeley MHAD dataset.