DeepTracking-Net: 3D Tracking with Unsupervised Learning of Continuous Flow
This addresses the challenge of generalizing deep learning for 3D tracking, which is important for applications like motion analysis and robotics, though it appears incremental as it builds on existing unsupervised and 3D matching approaches.
The paper tackles the problem of 3D tracking for time-varying shapes by proposing DeepTracking-Net, an unsupervised framework that uses a temporal-aware descriptor to regress continuous displacement fields, and it outperforms supervised state-of-the-art methods on simulated and real datasets.
This paper deals with the problem of 3D tracking, i.e., to find dense correspondences in a sequence of time-varying 3D shapes. Despite deep learning approaches have achieved promising performance for pairwise dense 3D shapes matching, it is a great challenge to generalize those approaches for the tracking of 3D time-varying geometries. In this paper, we aim at handling the problem of 3D tracking, which provides the tracking of the consecutive frames of 3D shapes. We propose a novel unsupervised 3D shape registration framework named DeepTracking-Net, which uses the deep neural networks (DNNs) as auxiliary functions to produce spatially and temporally continuous displacement fields for 3D tracking of objects in a temporal order. Our key novelty is that we present a novel temporal-aware correspondence descriptor (TCD) that captures spatio-temporal essence from consecutive 3D point cloud frames. Specifically, our DeepTracking-Net starts with optimizing a randomly initialized latent TCD. The TCD is then decoded to regress a continuous flow (i.e. a displacement vector field) which assigns a motion vector to every point of time-varying 3D shapes. Our DeepTracking-Net jointly optimizes TCDs and DNNs' weights towards the minimization of an unsupervised alignment loss. Experiments on both simulated and real data sets demonstrate that our unsupervised DeepTracking-Net outperforms the current supervised state-of-the-art method. In addition, we prepare a new synthetic 3D data, named SynMotions, to the 3D tracking and recognition community.