CVApr 17, 2025

Unsupervised Cross-Domain 3D Human Pose Estimation via Pseudo-Label-Guided Global Transforms

arXiv:2504.12699v2h-index: 8IEEE transactions on circuits and systems for video technology (Print)
Originality Incremental advance
AI Analysis

This addresses the challenge of domain adaptation for 3D human pose estimation, which is crucial for applications like robotics and AR/VR, though it is incremental as it builds on existing methods with novel components.

The paper tackles the problem of domain shift in cross-scenario 3D human pose estimation by proposing a framework that uses pseudo-labels and global transformations to align poses across domains, achieving state-of-the-art performance on benchmarks like Human3.6M, MPI-INF-3DHP, and 3DPW, even outperforming target-trained models.

Existing 3D human pose estimation methods often suffer in performance, when applied to cross-scenario inference, due to domain shifts in characteristics such as camera viewpoint, position, posture, and body size. Among these factors, camera viewpoints and locations have been shown to contribute significantly to the domain gap by influencing the global positions of human poses. To address this, we propose a novel framework that explicitly conducts global transformations between pose positions in the camera coordinate systems of source and target domains. We start with a Pseudo-Label Generation Module that is applied to the 2D poses of the target dataset to generate pseudo-3D poses. Then, a Global Transformation Module leverages a human-centered coordinate system as a novel bridging mechanism to seamlessly align the positional orientations of poses across disparate domains, ensuring consistent spatial referencing. To further enhance generalization, a Pose Augmentor is incorporated to address variations in human posture and body size. This process is iterative, allowing refined pseudo-labels to progressively improve guidance for domain adaptation. Our method is evaluated on various cross-dataset benchmarks, including Human3.6M, MPI-INF-3DHP, and 3DPW. The proposed method outperforms state-of-the-art approaches and even outperforms the target-trained model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes