RAM: Recover Any 3D Human Motion in-the-Wild
This addresses the challenge of markerless 3D human motion capture in dynamic, real-world settings for applications like surveillance or sports analysis, representing a strong incremental improvement.
The paper tackles the problem of robust 3D human motion reconstruction in uncontrolled environments by developing RAM, which integrates motion-aware tracking, temporal priors, and pose prediction. Results show it substantially outperforms previous state-of-the-art methods on benchmarks like PoseTrack and 3DPW in both tracking stability and 3D accuracy.
RAM incorporates a motion-aware semantic tracker with adaptive Kalman filtering to achieve robust identity association under severe occlusions and dynamic interactions. A memory-augmented Temporal HMR module further enhances human motion reconstruction by injecting spatio-temporal priors for consistent and smooth motion estimation. Moreover, a lightweight Predictor module forecasts future poses to maintain reconstruction continuity, while a gated combiner adaptively fuses reconstructed and predicted features to ensure coherence and robustness. Experiments on in-the-wild multi-person benchmarks such as PoseTrack and 3DPW, demonstrate that RAM substantially outperforms previous state-of-the-art in both Zero-shot tracking stability and 3D accuracy, offering a generalizable paradigm for markerless 3D human motion capture in-the-wild.