CVApr 29, 2020

Motion Guided 3D Pose Estimation from Videos

arXiv:2004.13985v1246 citations
AI Analysis

This work addresses 3D pose estimation for computer vision applications, offering incremental improvements through a novel loss and network design.

The paper tackles monocular 3D human pose estimation from videos by introducing a motion loss and a U-shaped GCN architecture, achieving state-of-the-art results on benchmarks like Human3.6M and MPI-INF-3DHP with improved smoothness and motion recovery.

We propose a new loss function, called motion loss, for the problem of monocular 3D Human pose estimation from 2D pose. In computing motion loss, a simple yet effective representation for keypoint motion, called pairwise motion encoding, is introduced. We design a new graph convolutional network architecture, U-shaped GCN (UGCN). It captures both short-term and long-term motion information to fully leverage the additional supervision from the motion loss. We experiment training UGCN with the motion loss on two large scale benchmarks: Human3.6M and MPI-INF-3DHP. Our model surpasses other state-of-the-art models by a large margin. It also demonstrates strong capacity in producing smooth 3D sequences and recovering keypoint motion.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes