Deep Non-Rigid Structure from Motion
This addresses the problem of practical utility in vision applications by enabling more scalable and complex reconstructions, representing a novel method for a known bottleneck.
The paper tackles the limitations of non-rigid structure from motion (NRSfM) in handling large-scale and complex shape variability by proposing a deep neural network that recovers camera poses and 3D points from 2D image coordinates, achieving superior precision and robustness against state-of-the-art methods by an order of magnitude.
Current non-rigid structure from motion (NRSfM) algorithms are mainly limited with respect to: (i) the number of images, and (ii) the type of shape variability they can handle. This has hampered the practical utility of NRSfM for many applications within vision. In this paper we propose a novel deep neural network to recover camera poses and 3D points solely from an ensemble of 2D image coordinates. The proposed neural network is mathematically interpretable as a multi-layer block sparse dictionary learning problem, and can handle problems of unprecedented scale and shape complexity. Extensive experiments demonstrate the impressive performance of our approach where we exhibit superior precision and robustness against all available state-of-the-art works in the order of magnitude. We further propose a quality measure (based on the network weights) which circumvents the need for 3D ground-truth to ascertain the confidence we have in the reconstruction.