CVJun 1

TROPHIES: Temporal Reconstruction of Places, Humans, and Cameras from Multi-view Videos

arXiv:2606.0235063.6
Predicted impact top 52% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the problem of coherent 4D human-scene reconstruction for multi-view video understanding, which is crucial for applications like AR/VR and robotics.

TROPHIES introduces a unified framework for jointly reconstructing dynamic humans, static scenes, and camera poses from multi-view videos, achieving globally aligned and physically plausible 4D reconstructions. It outperforms prior methods on EgoHuman and EgoExo4D datasets in global fidelity and human-scene consistency.

Reconstructing humans and their surrounding environments in a globally consistent 4D space is essential for comprehensive perception. However, prior works typically assume single-view inputs or decouple humans, scenes, and cameras, making them unable to recover coherent geometry, stable motion, and physically aligned trajectories. These limitations motivate us to introduce a new task: unified human-scene-camera reconstruction from multi-view videos, which aims to jointly estimate dynamic humans, static scenes, and camera poses in one global coordinate frame. We propose TROPHIES--Temporal Reconstruction of Places, Humans, and Cameras from Multi-view Videos-a unified framework tailored for this task. TROPHIES features a Human Branch that models humans through temporal and spatial reasoning, and a Scene Branch that reconstructs static geometry with human-aware attention. A global alignment and optimization module couples both branches by enforcing scale consistency, contact priors, and cross-view temporal coherence. Experiments on EgoHuman and EgoExo4D demonstrate that TROPHIES achieves globally aligned, physically plausible 4D reconstructions and consistently outperforms existing paradigms in both global fidelity and human-scene consistency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes