Multi-Level Neural Scene Graphs for Dynamic Urban Environments
This work addresses the challenge of scalable and efficient 3D scene reconstruction for dynamic urban environments, which is incremental as it builds on existing radiance field techniques.
The paper tackles the problem of estimating radiance fields for large-scale dynamic urban environments from multiple vehicle captures, achieving significant performance improvements over prior methods and faster training and rendering.
We estimate the radiance field of large-scale dynamic areas from multiple vehicle captures under varying environmental conditions. Previous works in this domain are either restricted to static environments, do not scale to more than a single short video, or struggle to separately represent dynamic object instances. To this end, we present a novel, decomposable radiance field approach for dynamic urban environments. We propose a multi-level neural scene graph representation that scales to thousands of images from dozens of sequences with hundreds of fast-moving objects. To enable efficient training and rendering of our representation, we develop a fast composite ray sampling and rendering scheme. To test our approach in urban driving scenarios, we introduce a new, novel view synthesis benchmark. We show that our approach outperforms prior art by a significant margin on both established and our proposed benchmark while being faster in training and rendering.