CVNov 26, 2025

TAPVid-360: Tracking Any Point in 360 from Narrow Field of View Video

arXiv:2511.21946v2
Originality Incremental advance
AI Analysis

This addresses the limitation of current vision systems in achieving persistent, panoramic understanding for applications like robotics and AR, though it is incremental as it builds on prior TAP tasks.

The paper tackles the problem of tracking points outside the field of view in videos by introducing TAPVid-360, a task that predicts 3D directions to queried points across sequences, and presents a dataset and baseline method that outperforms existing approaches.

Humans excel at constructing panoramic mental models of their surroundings, maintaining object permanence and inferring scene structure beyond visible regions. In contrast, current artificial vision systems struggle with persistent, panoramic understanding, often processing scenes egocentrically on a frame-by-frame basis. This limitation is pronounced in the Track Any Point (TAP) task, where existing methods fail to track 2D points outside the field of view. To address this, we introduce TAPVid-360, a novel task that requires predicting the 3D direction to queried scene points across a video sequence, even when far outside the narrow field of view of the observed video. This task fosters learning allocentric scene representations without needing dynamic 4D ground truth scene models for training. Instead, we exploit 360 videos as a source of supervision, resampling them into narrow field-of-view perspectives while computing ground truth directions by tracking points across the full panorama using a 2D pipeline. We introduce a new dataset and benchmark, TAPVid360-10k comprising 10k perspective videos with ground truth directional point tracking. Our baseline adapts CoTracker v3 to predict per-point rotations for direction updates, outperforming existing TAP and TAPVid 3D methods. Project page: https://finlay-hudson.github.io/tapvid360

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes