A Versatile Scene Model with Differentiable Visibility Applied to Generative Pose Estimation
This addresses the problem of occlusion handling in generative pose estimation for computer vision researchers, offering a versatile model applicable to tasks like multi-object pose estimation and human motion capture, though it is incremental as it builds on existing generative methods with a novel visibility formulation.
The paper tackled the challenge of handling occlusions in generative 3D reconstruction by introducing a new scene representation that enables an analytically differentiable closed-form formulation of surface visibility, resulting in smooth pose similarity energies with rigorous occlusion handling, fewer local minima, and experimentally verified improved convergence of numerical optimization.
Generative reconstruction methods compute the 3D configuration (such as pose and/or geometry) of a shape by optimizing the overlap of the projected 3D shape model with images. Proper handling of occlusions is a big challenge, since the visibility function that indicates if a surface point is seen from a camera can often not be formulated in closed form, and is in general discrete and non-differentiable at occlusion boundaries. We present a new scene representation that enables an analytically differentiable closed-form formulation of surface visibility. In contrast to previous methods, this yields smooth, analytically differentiable, and efficient to optimize pose similarity energies with rigorous occlusion handling, fewer local minima, and experimentally verified improved convergence of numerical optimization. The underlying idea is a new image formation model that represents opaque objects by a translucent medium with a smooth Gaussian density distribution which turns visibility into a smooth phenomenon. We demonstrate the advantages of our versatile scene model in several generative pose estimation problems, namely marker-less multi-object pose estimation, marker-less human motion capture with few cameras, and image-based 3D geometry estimation.