CVDec 13, 2022

Accidental Turntables: Learning 3D Pose by Watching Objects Turn

arXiv:2212.06300v13 citationsh-index: 50
Originality Incremental advance
AI Analysis

This addresses the problem of 3D pose estimation for computer vision researchers by providing a label-free method, though it is incremental as it builds on existing structure-from-motion and detection techniques.

The paper tackles 3D object pose estimation by learning from in-the-wild videos where objects turn, achieving competitive performance on standard benchmarks without requiring pose labels during training, with results including a dataset of 41,212 car images.

We propose a technique for learning single-view 3D object pose estimation models by utilizing a new source of data -- in-the-wild videos where objects turn. Such videos are prevalent in practice (e.g., cars in roundabouts, airplanes near runways) and easy to collect. We show that classical structure-from-motion algorithms, coupled with the recent advances in instance detection and feature matching, provides surprisingly accurate relative 3D pose estimation on such videos. We propose a multi-stage training scheme that first learns a canonical pose across a collection of videos and then supervises a model for single-view pose estimation. The proposed technique achieves competitive performance with respect to existing state-of-the-art on standard benchmarks for 3D pose estimation, without requiring any pose labels during training. We also contribute an Accidental Turntables Dataset, containing a challenging set of 41,212 images of cars in cluttered backgrounds, motion blur and illumination changes that serves as a benchmark for 3D pose estimation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes