CVMar 29, 2019

FrameNet: Learning Local Canonical Frames of 3D Surfaces from a Single RGB Image

arXiv:1903.12305v150 citations
Originality Incremental advance
AI Analysis

This addresses a novel problem in 3D computer vision for tasks such as AR and robotics, though it builds incrementally on existing methods for direction fields and surface normal prediction.

The paper tackles the problem of predicting dense canonical 3D coordinate frames from a single RGB image, achieving results that improve surface normal estimation and enable applications like feature matching and augmented reality.

In this work, we introduce the novel problem of identifying dense canonical 3D coordinate frames from a single RGB image. We observe that each pixel in an image corresponds to a surface in the underlying 3D geometry, where a canonical frame can be identified as represented by three orthogonal axes, one along its normal direction and two in its tangent plane. We propose an algorithm to predict these axes from RGB. Our first insight is that canonical frames computed automatically with recently introduced direction field synthesis methods can provide training data for the task. Our second insight is that networks designed for surface normal prediction provide better results when trained jointly to predict canonical frames, and even better when trained to also predict 2D projections of canonical frames. We conjecture this is because projections of canonical tangent directions often align with local gradients in images, and because those directions are tightly linked to 3D canonical frames through projective geometry and orthogonality constraints. In our experiments, we find that our method predicts 3D canonical frames that can be used in applications ranging from surface normal estimation, feature matching, and augmented reality.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes