CVAug 29, 2025

Efficient Diffusion-Based 3D Human Pose Estimation with Hierarchical Temporal Pruning

arXiv:2508.21363v11 citationsh-index: 4IEEE transactions on circuits and systems for video technology (Print)
Originality Incremental advance
AI Analysis

This work addresses efficiency issues for researchers and practitioners using diffusion models in 3D human pose estimation, though it is incremental as it builds on existing diffusion methods.

The paper tackles the high computational cost of diffusion-based 3D human pose estimation by proposing a Hierarchical Temporal Pruning strategy, which reduces training MACs by 38.5%, inference MACs by 56.8%, and improves inference speed by 81.1% while achieving state-of-the-art performance on benchmarks.

Diffusion models have demonstrated strong capabilities in generating high-fidelity 3D human poses, yet their iterative nature and multi-hypothesis requirements incur substantial computational cost. In this paper, we propose an Efficient Diffusion-Based 3D Human Pose Estimation framework with a Hierarchical Temporal Pruning (HTP) strategy, which dynamically prunes redundant pose tokens across both frame and semantic levels while preserving critical motion dynamics. HTP operates in a staged, top-down manner: (1) Temporal Correlation-Enhanced Pruning (TCEP) identifies essential frames by analyzing inter-frame motion correlations through adaptive temporal graph construction; (2) Sparse-Focused Temporal MHSA (SFT MHSA) leverages the resulting frame-level sparsity to reduce attention computation, focusing on motion-relevant tokens; and (3) Mask-Guided Pose Token Pruner (MGPTP) performs fine-grained semantic pruning via clustering, retaining only the most informative pose tokens. Experiments on Human3.6M and MPI-INF-3DHP show that HTP reduces training MACs by 38.5\%, inference MACs by 56.8\%, and improves inference speed by an average of 81.1\% compared to prior diffusion-based methods, while achieving state-of-the-art performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes