Rethinking Pose in 3D: Multi-stage Refinement and Recovery for Markerless Motion Capture
This work addresses human motion capture for applications like animation or biomechanics, but it is incremental as it builds on existing CNN methods with a novel refinement strategy.
The paper tackles markerless motion capture by introducing a multi-stage CNN-based approach that uses 3D reasoning throughout to refine human pose estimates, recovering from errors and leveraging image cues, and demonstrates using multi-camera outputs to improve single-camera models.
We propose a CNN-based approach for multi-camera markerless motion capture of the human body. Unlike existing methods that first perform pose estimation on individual cameras and generate 3D models as post-processing, our approach makes use of 3D reasoning throughout a multi-stage approach. This novelty allows us to use provisional 3D models of human pose to rethink where the joints should be located in the image and to recover from past mistakes. Our principled refinement of 3D human poses lets us make use of image cues, even from images where we previously misdetected joints, to refine our estimates as part of an end-to-end approach. Finally, we demonstrate how the high-quality output of our multi-camera setup can be used as an additional training source to improve the accuracy of existing single camera models.