CVApr 6, 2017

A Convolution Tree with Deconvolution Branches: Exploiting Geometric Relationships for Single Shot Keypoint Detection

arXiv:1704.01880v13 citations
Originality Incremental advance
AI Analysis

This work addresses the specific problem of facial keypoint detection in computer vision, offering a single-shot method that improves accuracy for in-the-wild images, though it appears incremental as it builds on existing DCNN approaches.

The paper tackles the problem of capturing geometric relationships among facial keypoints in deep convolutional networks by proposing a novel convolution-deconvolution network with learnable transform functions and a pose-based routing function, achieving improved accuracy on challenging datasets like AFW and AFLW.

Recently, Deep Convolution Networks (DCNNs) have been applied to the task of face alignment and have shown potential for learning improved feature representations. Although deeper layers can capture abstract concepts like pose, it is difficult to capture the geometric relationships among the keypoints in DCNNs. In this paper, we propose a novel convolution-deconvolution network for facial keypoint detection. Our model predicts the 2D locations of the keypoints and their individual visibility along with 3D head pose, while exploiting the spatial relationships among different keypoints. Different from existing approaches of modeling these relationships, we propose learnable transform functions which captures the relationships between keypoints at feature level. However, due to extensive variations in pose, not all of these relationships act at once, and hence we propose, a pose-based routing function which implicitly models the active relationships. Both transform functions and the routing function are implemented through convolutions in a multi-task framework. Our approach presents a single-shot keypoint detection method, making it different from many existing cascade regression-based methods. We also show that learning these relationships significantly improve the accuracy of keypoint detections for in-the-wild face images from challenging datasets such as AFW and AFLW.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes