CVJul 19, 2023

Learning from Abstract Images: on the Importance of Occlusion in a Minimalist Encoding of Human Poses

arXiv:2307.09893v1h-index: 2
AI Analysis

This addresses the issue of camera viewpoint dependency in pose estimation for computer vision applications, though it appears incremental by building on existing 2D keypoint methods.

The paper tackles the problem of poor cross-dataset performance in 2D-to-3D pose lifting by proposing a novel representation using opaque 3D limbs that preserves occlusion information, resulting in a 'quantum leap' in cross-dataset benchmarks.

Existing 2D-to-3D pose lifting networks suffer from poor performance in cross-dataset benchmarks. Although the use of 2D keypoints joined by "stick-figure" limbs has shown promise as an intermediate step, stick-figures do not account for occlusion information that is often inherent in an image. In this paper, we propose a novel representation using opaque 3D limbs that preserves occlusion information while implicitly encoding joint locations. Crucially, when training on data with accurate three-dimensional keypoints and without part-maps, this representation allows training on abstract synthetic images, with occlusion, from as many synthetic viewpoints as desired. The result is a pose defined by limb angles rather than joint positions $\unicode{x2013}$ because poses are, in the real world, independent of cameras $\unicode{x2013}$ allowing us to predict poses that are completely independent of camera viewpoint. The result provides not only an improvement in same-dataset benchmarks, but a "quantum leap" in cross-dataset benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes