NeuralDome: A Neural Modeling Pipeline on Multi-View Human-Object Interactions
This addresses the need for free-viewpoint interaction datasets and tools for computer vision researchers, though it is incremental as it builds on existing multi-view and neural rendering methods.
The authors tackled the problem of occlusions and ambiguities in capturing human-object interactions from a fixed viewpoint by constructing a dense multi-view dome to acquire the HODome dataset with ~75M frames, and developed NeuralDome, a neural processing pipeline for accurate tracking, reconstruction, and free-view rendering, demonstrating effectiveness in various tasks.
Humans constantly interact with objects in daily life tasks. Capturing such processes and subsequently conducting visual inferences from a fixed viewpoint suffers from occlusions, shape and texture ambiguities, motions, etc. To mitigate the problem, it is essential to build a training dataset that captures free-viewpoint interactions. We construct a dense multi-view dome to acquire a complex human object interaction dataset, named HODome, that consists of $\sim$75M frames on 10 subjects interacting with 23 objects. To process the HODome dataset, we develop NeuralDome, a layer-wise neural processing pipeline tailored for multi-view video inputs to conduct accurate tracking, geometry reconstruction and free-view rendering, for both human subjects and objects. Extensive experiments on the HODome dataset demonstrate the effectiveness of NeuralDome on a variety of inference, modeling, and rendering tasks. Both the dataset and the NeuralDome tools will be disseminated to the community for further development.