CVApr 1, 2020

Articulation-aware Canonical Surface Mapping

arXiv:2004.00614v3108 citations
AI Analysis

This addresses the problem of 3D shape understanding from images for computer vision researchers, representing an incremental improvement by removing the need for keypoint supervision.

The paper tackles the problem of predicting canonical surface mappings and inferring articulation/pose from 2D images without keypoint annotations, showing that their method learns from image collections using only foreground mask labels and achieves more accurate predictions through geometric consistency.

We tackle the tasks of: 1) predicting a Canonical Surface Mapping (CSM) that indicates the mapping from 2D pixels to corresponding points on a canonical template shape, and 2) inferring the articulation and pose of the template corresponding to the input image. While previous approaches rely on keypoint supervision for learning, we present an approach that can learn without such annotations. Our key insight is that these tasks are geometrically related, and we can obtain supervisory signal via enforcing consistency among the predictions. We present results across a diverse set of animal object categories, showing that our method can learn articulation and CSM prediction from image collections using only foreground mask labels for training. We empirically show that allowing articulation helps learn more accurate CSM prediction, and that enforcing the consistency with predicted CSM is similarly critical for learning meaningful articulation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes