CVMay 27, 2025

Spectral Compression Transformer with Line Pose Graph for Monocular 3D Human Pose Estimation

Zenghao Zheng, Lianping Yang, Hegui Zhu, Mingrui Ye

arXiv:2505.21309v16.25 citationsh-index: 2Pattern Recognition

Originality Incremental advance

AI Analysis

This work addresses computational efficiency and performance for monocular 3D human pose estimation, which is incremental as it builds on existing transformer methods with novel compression and structural enhancements.

The paper tackles the high computational cost and redundancy in transformer-based 3D human pose estimation by introducing the Spectral Compression Transformer (SCT) to compress sequence length and accelerate computation, achieving state-of-the-art performance with an MPJPE of 37.7mm on Human3.6M while maintaining low computational cost.

Transformer-based 3D human pose estimation methods suffer from high computational costs due to the quadratic complexity of self-attention with respect to sequence length. Additionally, pose sequences often contain significant redundancy between frames. However, recent methods typically fail to improve model capacity while effectively eliminating sequence redundancy. In this work, we introduce the Spectral Compression Transformer (SCT) to reduce sequence length and accelerate computation. The SCT encoder treats hidden features between blocks as Temporal Feature Signals (TFS) and applies the Discrete Cosine Transform, a Fourier transform-based technique, to determine the spectral components to be retained. By filtering out certain high-frequency noise components, SCT compresses the sequence length and reduces redundancy. To further enrich the input sequence with prior structural information, we propose the Line Pose Graph (LPG) based on line graph theory. The LPG generates skeletal position information that complements the input 2D joint positions, thereby improving the model's performance. Finally, we design a dual-stream network architecture to effectively model spatial joint relationships and the compressed motion trajectory within the pose sequence. Extensive experiments on two benchmark datasets (i.e., Human3.6M and MPI-INF-3DHP) demonstrate that our model achieves state-of-the-art performance with improved computational efficiency. For example, on the Human3.6M dataset, our method achieves an MPJPE of 37.7mm while maintaining a low computational cost. Furthermore, we perform ablation studies on each module to assess its effectiveness. The code and models will be released.

View on arXiv PDF

Similar