RealCam-Vid: High-resolution Video Dataset with Dynamic Scenes and Metric-scale Camera Movements
This dataset addresses a bottleneck for researchers in video generation by providing metric-scale geometric consistency in dynamic scenes, which is incremental as it builds on prior static-scene datasets.
The authors tackled the limitation of existing datasets for camera-controllable video generation by introducing RealCam-Vid, the first fully open-source, high-resolution dynamic-scene dataset with metric-scale camera annotations, enabling more realistic object motions and precise camera trajectories.
Recent advances in camera-controllable video generation have been constrained by the reliance on static-scene datasets with relative-scale camera annotations, such as RealEstate10K. While these datasets enable basic viewpoint control, they fail to capture dynamic scene interactions and lack metric-scale geometric consistency-critical for synthesizing realistic object motions and precise camera trajectories in complex environments. To bridge this gap, we introduce the first fully open-source, high-resolution dynamic-scene dataset with metric-scale camera annotations in https://github.com/ZGCTroy/RealCam-Vid.