CVJul 16, 2025

SpatialTrackerV2: 3D Point Tracking Made Easy

Yuxi Xiao, Jianyuan Wang, Nan Xue, Nikita Karaev, Yuri Makarov, Bingyi Kang, Xing Zhu, Hujun Bao, Yujun Shen, Xiaowei Zhou

arXiv:2507.12462v236.574 citationsh-index: 66Has Code

Originality Highly original

AI Analysis

This work addresses the challenge of efficient and accurate 3D point tracking for computer vision applications, representing a significant advancement over incremental improvements.

The paper tackles the problem of 3D point tracking in monocular videos by proposing SpatialTrackerV2, a feed-forward method that unifies point tracking, depth estimation, and camera pose estimation, resulting in a 30% performance improvement over existing methods and matching the accuracy of dynamic 3D reconstruction approaches while being 50 times faster.

We present SpatialTrackerV2, a feed-forward 3D point tracking method for monocular videos. Going beyond modular pipelines built on off-the-shelf components for 3D tracking, our approach unifies the intrinsic connections between point tracking, monocular depth, and camera pose estimation into a high-performing and feedforward 3D point tracker. It decomposes world-space 3D motion into scene geometry, camera ego-motion, and pixel-wise object motion, with a fully differentiable and end-to-end architecture, allowing scalable training across a wide range of datasets, including synthetic sequences, posed RGB-D videos, and unlabeled in-the-wild footage. By learning geometry and motion jointly from such heterogeneous data, SpatialTrackerV2 outperforms existing 3D tracking methods by 30%, and matches the accuracy of leading dynamic 3D reconstruction approaches while running 50$\times$ faster.

View on arXiv PDF Code

Similar