CVApr 21, 2023

Factored Neural Representation for Scene Understanding

arXiv:2304.10950v34 citationsh-index: 73
Originality Incremental advance
AI Analysis

This addresses the challenge of scene understanding for robotics and AR/VR applications by enabling manipulation beyond view synthesis, though it is incremental as it builds on neural implicit representations.

The paper tackles the problem of learning interpretable and editable object-level scene representations directly from monocular RGB-D video, particularly with moving or deforming objects, and demonstrates that their factored neural representation achieves efficient, interpretable, and editable results, such as changing object trajectories, on synthetic and real data.

A long-standing goal in scene understanding is to obtain interpretable and editable representations that can be directly constructed from a raw monocular RGB-D video, without requiring specialized hardware setup or priors. The problem is significantly more challenging in the presence of multiple moving and/or deforming objects. Traditional methods have approached the setup with a mix of simplifications, scene priors, pretrained templates, or known deformation models. The advent of neural representations, especially neural implicit representations and radiance fields, opens the possibility of end-to-end optimization to collectively capture geometry, appearance, and object motion. However, current approaches produce global scene encoding, assume multiview capture with limited or no motion in the scenes, and do not facilitate easy manipulation beyond novel view synthesis. In this work, we introduce a factored neural scene representation that can directly be learned from a monocular RGB-D video to produce object-level neural presentations with an explicit encoding of object movement (e.g., rigid trajectory) and/or deformations (e.g., nonrigid movement). We evaluate ours against a set of neural approaches on both synthetic and real data to demonstrate that the representation is efficient, interpretable, and editable (e.g., change object trajectory). Code and data are available at http://geometry.cs.ucl.ac.uk/projects/2023/factorednerf .

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes