CVLGFeb 16, 2024

Occlusion Resilient 3D Human Pose Estimation

arXiv:2402.11036v12 citationsh-index: 153DV
Originality Incremental advance
AI Analysis

This addresses occlusions in 3D pose estimation for applications like motion capture or surveillance, but it appears incremental as it builds on existing temporal consistency methods.

The paper tackled the problem of occlusions in 3D human pose estimation from single-camera video by introducing a refinement network that uses graph convolutions and trains with binary masks to disable edges, simulating occlusions. It demonstrated effectiveness compared to state-of-the-art techniques, though no concrete numbers were provided in the abstract.

Occlusions remain one of the key challenges in 3D body pose estimation from single-camera video sequences. Temporal consistency has been extensively used to mitigate their impact but the existing algorithms in the literature do not explicitly model them. Here, we apply this by representing the deforming body as a spatio-temporal graph. We then introduce a refinement network that performs graph convolutions over this graph to output 3D poses. To ensure robustness to occlusions, we train this network with a set of binary masks that we use to disable some of the edges as in drop-out techniques. In effect, we simulate the fact that some joints can be hidden for periods of time and train the network to be immune to that. We demonstrate the effectiveness of this approach compared to state-of-the-art techniques that infer poses from single-camera sequences.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes