CVJan 4, 2025

Joint Optimization for 4D Human-Scene Reconstruction in the Wild

arXiv:2501.02158v122 citationsh-index: 8
Originality Highly original
AI Analysis

This addresses the challenge of reconstructing natural and diverse human-scene interactions from unconstrained web videos for applications in understanding and predicting human movements in scenes.

The paper tackles the problem of reconstructing human motion and surrounding environments from monocular web videos, proposing JOSH, an optimization-based method that jointly optimizes scene geometry, camera poses, and human motion using contact constraints, achieving better results in global human motion estimation and dense scene reconstruction. It also introduces JOSH3R, a more efficient model trained with pseudo-labels from web videos, which outperforms optimization-free methods.

Reconstructing human motion and its surrounding environment is crucial for understanding human-scene interaction and predicting human movements in the scene. While much progress has been made in capturing human-scene interaction in constrained environments, those prior methods can hardly reconstruct the natural and diverse human motion and scene context from web videos. In this work, we propose JOSH, a novel optimization-based method for 4D human-scene reconstruction in the wild from monocular videos. JOSH uses techniques in both dense scene reconstruction and human mesh recovery as initialization, and then it leverages the human-scene contact constraints to jointly optimize the scene, the camera poses, and the human motion. Experiment results show JOSH achieves better results on both global human motion estimation and dense scene reconstruction by joint optimization of scene geometry and human motion. We further design a more efficient model, JOSH3R, and directly train it with pseudo-labels from web videos. JOSH3R outperforms other optimization-free methods by only training with labels predicted from JOSH, further demonstrating its accuracy and generalization ability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes