CVJul 22, 2023

Replay: Multi-modal Multi-view Acted Videos for Casual Holography

Roman Shapovalov, Yanir Kleiman, Ignacio Rocco, David Novotny, Andrea Vedaldi, Changan Chen, Filippos Kokkinos, Ben Graham, Natalia Neverova

Meta AI

arXiv:2307.12067v19.811 citationsh-index: 105Has Code

Originality Synthesis-oriented

AI Analysis

This dataset addresses the need for high-quality, annotated multi-modal data for tasks like novel-view synthesis and 3D reconstruction in computer vision and AI, though it is incremental as it builds on existing dataset efforts.

The authors introduced Replay, a large-scale multi-view, multi-modal video dataset for social human interactions, containing over 4000 minutes of footage and 7 million annotated frames, and established a benchmark for novel-view synthesis with baseline evaluations.

We introduce Replay, a collection of multi-view, multi-modal videos of humans interacting socially. Each scene is filmed in high production quality, from different viewpoints with several static cameras, as well as wearable action cameras, and recorded with a large array of microphones at different positions in the room. Overall, the dataset contains over 4000 minutes of footage and over 7 million timestamped high-resolution frames annotated with camera poses and partially with foreground masks. The Replay dataset has many potential applications, such as novel-view synthesis, 3D reconstruction, novel-view acoustic synthesis, human body and face analysis, and training generative models. We provide a benchmark for training and evaluating novel-view synthesis, with two scenarios of different difficulty. Finally, we evaluate several baseline state-of-the-art methods on the new benchmark.

View on arXiv PDF Code

Similar