CVJan 20

On the Role of Rotation Equivariance in Monocular 3D Human Pose Estimation

arXiv:2601.13913v1h-index: 19
Originality Incremental advance
AI Analysis

This work addresses a specific challenge in 3D human pose estimation for computer vision applications, offering an incremental improvement over existing methods.

The paper tackles the problem of monocular 3D human pose estimation by addressing the failure of common lifting models with rotated inputs, showing that incorporating 2D rotation equivariance through augmentation improves performance on rotated poses and outperforms state-of-the-art equivariant-by-design methods.

Estimating 3D from 2D is one of the central tasks in computer vision. In this work, we consider the monocular setting, i.e. single-view input, for 3D human pose estimation (HPE). Here, the task is to predict a 3D point set of human skeletal joints from a single 2D input image. While by definition this is an ill-posed problem, recent work has presented methods that solve it with up to several-centimetre error. Typically, these methods employ a two-step approach, where the first step is to detect the 2D skeletal joints in the input image, followed by the step of 2D-to-3D lifting. We find that common lifting models fail when encountering a rotated input. We argue that learning a single human pose along with its in-plane rotations is considerably easier and more geometrically grounded than directly learning a point-to-point mapping. Furthermore, our intuition is that endowing the model with the notion of rotation equivariance without explicitly constraining its parameter space should lead to a more straightforward learning process than one with equivariance by design. Utilising the common HPE benchmarks, we confirm that the 2D rotation equivariance per se improves the model performance on human poses akin to rotations in the image plane, and can be efficiently and straightforwardly learned by augmentation, outperforming state-of-the-art equivariant-by-design methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes