CVDec 15, 2022

NeuralDome: A Neural Modeling Pipeline on Multi-View Human-Object Interactions

arXiv:2212.07626v172 citationsh-index: 28
Originality Incremental advance
AI Analysis

This addresses the need for free-viewpoint interaction datasets and tools for computer vision researchers, though it is incremental as it builds on existing multi-view and neural rendering methods.

The authors tackled the problem of occlusions and ambiguities in capturing human-object interactions from a fixed viewpoint by constructing a dense multi-view dome to acquire the HODome dataset with ~75M frames, and developed NeuralDome, a neural processing pipeline for accurate tracking, reconstruction, and free-view rendering, demonstrating effectiveness in various tasks.

Humans constantly interact with objects in daily life tasks. Capturing such processes and subsequently conducting visual inferences from a fixed viewpoint suffers from occlusions, shape and texture ambiguities, motions, etc. To mitigate the problem, it is essential to build a training dataset that captures free-viewpoint interactions. We construct a dense multi-view dome to acquire a complex human object interaction dataset, named HODome, that consists of $\sim$75M frames on 10 subjects interacting with 23 objects. To process the HODome dataset, we develop NeuralDome, a layer-wise neural processing pipeline tailored for multi-view video inputs to conduct accurate tracking, geometry reconstruction and free-view rendering, for both human subjects and objects. Extensive experiments on the HODome dataset demonstrate the effectiveness of NeuralDome on a variety of inference, modeling, and rendering tasks. Both the dataset and the NeuralDome tools will be disseminated to the community for further development.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes