CVAIJun 17, 2025

PoseGRAF: Geometric-Reinforced Adaptive Fusion for Monocular 3D Human Pose Estimation

arXiv:2506.14596v1Has Code
Originality Highly original
AI Analysis

This work improves 3D pose estimation for applications like animation or robotics by introducing a novel fusion method, though it is incremental in advancing existing graph-based approaches.

The paper tackles the problem of monocular 3D human pose estimation by addressing overlooked directional and angular correlations in skeletons, which cause implausible poses under occlusions or rapid motion, and proposes PoseGRAF, a framework that achieves state-of-the-art results on Human3.6M and MPI-INF-3DHP datasets.

Existing monocular 3D pose estimation methods primarily rely on joint positional features, while overlooking intrinsic directional and angular correlations within the skeleton. As a result, they often produce implausible poses under joint occlusions or rapid motion changes. To address these challenges, we propose the PoseGRAF framework. We first construct a dual graph convolutional structure that separately processes joint and bone graphs, effectively capturing their local dependencies. A Cross-Attention module is then introduced to model interdependencies between bone directions and joint features. Building upon this, a dynamic fusion module is designed to adaptively integrate both feature types by leveraging the relational dependencies between joints and bones. An improved Transformer encoder is further incorporated in a residual manner to generate the final output. Experimental results on the Human3.6M and MPI-INF-3DHP datasets show that our method exceeds state-of-the-art approaches. Additional evaluations on in-the-wild videos further validate its generalizability. The code is publicly available at https://github.com/iCityLab/PoseGRAF.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes