CVNov 17, 2016

Learning to Fuse 2D and 3D Image Cues for Monocular Body Pose Estimation

arXiv:1611.05708v350 citations
Originality Incremental advance
AI Analysis

This addresses the problem of accurate 3D human pose estimation from single images for applications like robotics and AR, representing an incremental advance by combining existing approaches with learned fusion.

The paper tackled monocular 3D human pose estimation by proposing a novel architecture that simultaneously performs 2D and 3D regression with a trainable fusion scheme, resulting in significant improvements over state-of-the-art on standard benchmarks.

Most recent approaches to monocular 3D human pose estimation rely on Deep Learning. They typically involve regressing from an image to either 3D joint coordinates directly or 2D joint locations from which 3D coordinates are inferred. Both approaches have their strengths and weaknesses and we therefore propose a novel architecture designed to deliver the best of both worlds by performing both simultaneously and fusing the information along the way. At the heart of our framework is a trainable fusion scheme that learns how to fuse the information optimally instead of being hand-designed. This yields significant improvements upon the state-of-the-art on standard 3D human pose estimation benchmarks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes