CVMar 16

Fast SAM 3D Body: Accelerating SAM 3D Body for Real-Time Full-Body Human Mesh Recovery

arXiv:2603.1560326.9h-index: 10
Predicted impact top 38% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the problem of slow inference for real-time full-body human mesh recovery, enabling applications such as humanoid control from a single RGB stream, though it is incremental as it accelerates an existing method without fundamental changes.

The paper tackles the high inference latency of SAM 3D Body for monocular 3D human mesh recovery by introducing a training-free acceleration framework, achieving up to a 10.9x speedup while maintaining comparable accuracy and enabling real-time applications like vision-only teleoperation.

SAM 3D Body (3DB) achieves state-of-the-art accuracy in monocular 3D human mesh recovery, yet its inference latency of several seconds per image precludes real-time application. We present Fast SAM 3D Body, a training-free acceleration framework that reformulates the 3DB inference pathway to achieve interactive rates. By decoupling serial spatial dependencies and applying architecture-aware pruning, we enable parallelized multi-crop feature extraction and streamlined transformer decoding. Moreover, to extract the joint-level kinematics (SMPL) compatible with existing humanoid control and policy learning frameworks, we replace the iterative mesh fitting with a direct feedforward mapping, accelerating this specific conversion by over 10,000x. Overall, our framework delivers up to a 10.9x end-to-end speedup while maintaining on-par reconstruction fidelity, even surpassing 3DB on benchmarks such as LSPET. We demonstrate its utility by deploying Fast SAM 3D Body in a vision-only teleoperation system that-unlike methods reliant on wearable IMUs-enables real-time humanoid control and the direct collection of manipulation policies from a single RGB stream.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes