Revitalizing Optimization for 3D Human Pose and Shape Estimation: A Sparse Constrained Formulation
This work addresses the computational bottleneck in real-time motion capture for applications like animation or VR, though it is incremental as it improves speed over existing optimization methods.
The paper tackles the problem of slow optimization for 3D human pose and shape estimation by proposing a sparse constrained formulation, achieving real-time performance with an average convergence of 4 ms and competitive accuracy against state-of-the-art methods.
We propose a novel sparse constrained formulation and from it derive a real-time optimization method for 3D human pose and shape estimation. Our optimization method, SCOPE (Sparse Constrained Optimization for 3D human Pose and shapE estimation), is orders of magnitude faster (avg. 4 ms convergence) than existing optimization methods, while being mathematically equivalent to their dense unconstrained formulation under mild assumptions. We achieve this by exploiting the underlying sparsity and constraints of our formulation to efficiently compute the Gauss-Newton direction. We show that this computation scales linearly with the number of joints and measurements of a complex 3D human model, in contrast to prior work where it scales cubically due to their dense unconstrained formulation. Based on our optimization method, we present a real-time motion capture framework that estimates 3D human poses and shapes from a single image at over 30 FPS. In benchmarks against state-of-the-art methods on multiple public datasets, our framework outperforms other optimization methods and achieves competitive accuracy against regression methods. Project page with code and videos: https://sites.google.com/view/scope-human/.