Monocular Real-time Full Body Capture with Inter-part Correlations
This work addresses the problem of real-time, comprehensive 3D human capture from a single image for applications requiring detailed body, hand, and face reconstruction.
This paper introduces a real-time full body capture method from a single color image, simultaneously estimating the shape and motion of the body, hands, and a dynamic 3D face model. It achieves competitive accuracy on public benchmarks while being significantly faster and providing more complete face reconstructions.
We present the first method for real-time full body capture that estimates shape and motion of body and hands together with a dynamic 3D face model from a single color image. Our approach uses a new neural network architecture that exploits correlations between body and hands at high computational efficiency. Unlike previous works, our approach is jointly trained on multiple datasets focusing on hand, body or face separately, without requiring data where all the parts are annotated at the same time, which is much more difficult to create at sufficient variety. The possibility of such multi-dataset training enables superior generalization ability. In contrast to earlier monocular full body methods, our approach captures more expressive 3D face geometry and color by estimating the shape, expression, albedo and illumination parameters of a statistical face model. Our method achieves competitive accuracy on public benchmarks, while being significantly faster and providing more complete face reconstructions.