Deep Deformation Network for Object Landmark Localization
This addresses landmark localization for non-rigid objects in computer vision, offering an end-to-end pipeline with improved training ease and results, though it appears incremental as it builds on prior cascaded networks.
The authors tackled the problem of localizing landmarks in non-rigid objects by proposing a deep deformation network (DDN) that incorporates geometric constraints, resulting in state-of-the-art performances on benchmarks for tasks like facial landmark localization, human body pose estimation, and bird part localization.
We propose a novel cascaded framework, namely deep deformation network (DDN), for localizing landmarks in non-rigid objects. The hallmarks of DDN are its incorporation of geometric constraints within a convolutional neural network (CNN) framework, ease and efficiency of training, as well as generality of application. A novel shape basis network (SBN) forms the first stage of the cascade, whereby landmarks are initialized by combining the benefits of CNN features and a learned shape basis to reduce the complexity of the highly nonlinear pose manifold. In the second stage, a point transformer network (PTN) estimates local deformation parameterized as thin-plate spline transformation for a finer refinement. Our framework does not incorporate either handcrafted features or part connectivity, which enables an end-to-end shape prediction pipeline during both training and testing. In contrast to prior cascaded networks for landmark localization that learn a mapping from feature space to landmark locations, we demonstrate that the regularization induced through geometric priors in the DDN makes it easier to train, yet produces superior results. The efficacy and generality of the architecture is demonstrated through state-of-the-art performances on several benchmarks for multiple tasks such as facial landmark localization, human body pose estimation and bird part localization.