CVMar 14, 2025

Online Test-time Adaptation for 3D Human Pose Estimation: A Practical Perspective with Estimated 2D Poses

arXiv:2503.11194v11 citationsh-index: 6
Originality Incremental advance
AI Analysis

It addresses a practical challenge for real-time video applications by enabling adaptation with noisy inputs, though it is incremental as it builds on existing test-time adaptation methods.

This paper tackles the problem of adapting 3D human pose estimation models to streaming videos using estimated 2D poses instead of ground truth, proposing adaptive aggregation, two-stage optimization, and local augmentation to handle estimation errors, and reports surpassing state-of-the-art by a large margin.

Online test-time adaptation for 3D human pose estimation is used for video streams that differ from training data. Ground truth 2D poses are used for adaptation, but only estimated 2D poses are available in practice. This paper addresses adapting models to streaming videos with estimated 2D poses. Comparing adaptations reveals the challenge of limiting estimation errors while preserving accurate pose information. To this end, we propose adaptive aggregation, a two-stage optimization, and local augmentation for handling varying levels of estimated pose error. First, we perform adaptive aggregation across videos to initialize the model state with labeled representative samples. Within each video, we use a two-stage optimization to benefit from 2D fitting while minimizing the impact of erroneous updates. Second, we employ local augmentation, using adjacent confident samples to update the model before adapting to the current non-confident sample. Our method surpasses state-of-the-art by a large margin, advancing adaptation towards more practical settings of using estimated 2D poses.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes