CVApr 24, 2025

Dynamic Camera Poses and Where to Find Them

arXiv:2504.17788v123 citationsh-index: 29CVPR
Originality Synthesis-oriented
AI Analysis

This addresses the critical need for scalable camera pose annotation in dynamic Internet videos to advance realistic video generation and simulation, representing a domain-specific incremental improvement.

The authors tackled the problem of annotating camera poses on dynamic Internet videos by introducing DynPose-100K, a large-scale dataset of 100,000 videos with camera poses, using a pipeline that combines task-specific and generalist models for filtering and improves pose estimation through point tracking, dynamic masking, and structure-from-motion techniques.

Annotating camera poses on dynamic Internet videos at scale is critical for advancing fields like realistic video generation and simulation. However, collecting such a dataset is difficult, as most Internet videos are unsuitable for pose estimation. Furthermore, annotating dynamic Internet videos present significant challenges even for state-of-theart methods. In this paper, we introduce DynPose-100K, a large-scale dataset of dynamic Internet videos annotated with camera poses. Our collection pipeline addresses filtering using a carefully combined set of task-specific and generalist models. For pose estimation, we combine the latest techniques of point tracking, dynamic masking, and structure-from-motion to achieve improvements over the state-of-the-art approaches. Our analysis and experiments demonstrate that DynPose-100K is both large-scale and diverse across several key attributes, opening up avenues for advancements in various downstream applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes