CVJan 23, 2018

Numerical Coordinate Regression with Convolutional Neural Networks

arXiv:1801.07372v2234 citations
AI Analysis

This addresses a bottleneck in pose estimation and similar tasks by offering a more efficient and accurate alternative to current state-of-the-art methods, though it is an incremental improvement.

The paper tackles the problem of numerical coordinate regression from images by proposing a differentiable spatial to numerical transform (DSNT) layer, which improves prediction accuracy across tested model architectures compared to existing heatmap matching methods.

We study deep learning approaches to inferring numerical coordinates for points of interest in an input image. Existing convolutional neural network-based solutions to this problem either take a heatmap matching approach or regress to coordinates with a fully connected output layer. Neither of these approaches is ideal, since the former is not entirely differentiable, and the latter lacks inherent spatial generalization. We propose our differentiable spatial to numerical transform (DSNT) to fill this gap. The DSNT layer adds no trainable parameters, is fully differentiable, and exhibits good spatial generalization. Unlike heatmap matching, DSNT works well with low heatmap resolutions, so it can be dropped in as an output layer for a wide range of existing fully convolutional architectures. Consequently, DSNT offers a better trade-off between inference speed and prediction accuracy compared to existing techniques. When used to replace the popular heatmap matching approach used in almost all state-of-the-art methods for pose estimation, DSNT gives better prediction accuracy for all model architectures tested.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes