CVApr 1

Forecasting Motion in the Wild

arXiv:2604.0101580.01 citations
Predicted impact top 28% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the challenge of predictive visual intelligence for agents in unconstrained environments, offering a domain-specific solution with incremental improvements.

The paper tackles the problem of forecasting future motion of diverse non-rigid agents like animals in the wild by proposing dense point trajectories as a visual representation, and it shows that this approach outperforms state-of-the-art baselines and generalizes to rare species.

Visual intelligence requires anticipating the future behavior of agents, yet vision systems lack a general representation for motion and behavior. We propose dense point trajectories as visual tokens for behavior, a structured mid-level representation that disentangles motion from appearance and generalizes across diverse non-rigid agents, such as animals in-the-wild. Building on this abstraction, we design a diffusion transformer that models unordered sets of trajectories and explicitly reasons about occlusion, enabling coherent forecasts of complex motion patterns. To evaluate at scale, we curate 300 hours of unconstrained animal video with robust shot detection and camera-motion compensation. Experiments show that forecasting trajectory tokens achieves category-agnostic, data-efficient prediction, outperforms state-of-the-art baselines, and generalizes to rare species and morphologies, providing a foundation for predictive visual intelligence in the wild.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes