TöRF: Time-of-Flight Radiance Fields for Dynamic Scene View Synthesis
This work addresses dynamic scene reconstruction for applications like view synthesis, but it is incremental as it builds on existing neural radiance field methods by integrating ToF sensors.
The paper tackles the problem of reconstructing dynamic 3D scenes from monocular video, which is under-constrained, by using raw time-of-flight (ToF) camera measurements instead of data-driven priors, resulting in improved robustness to calibration errors and large motions.
Neural networks can represent and accurately reconstruct radiance fields for static 3D scenes (e.g., NeRF). Several works extend these to dynamic scenes captured with monocular video, with promising performance. However, the monocular setting is known to be an under-constrained problem, and so methods rely on data-driven priors for reconstructing dynamic content. We replace these priors with measurements from a time-of-flight (ToF) camera, and introduce a neural representation based on an image formation model for continuous-wave ToF cameras. Instead of working with processed depth maps, we model the raw ToF sensor measurements to improve reconstruction quality and avoid issues with low reflectance regions, multi-path interference, and a sensor's limited unambiguous depth range. We show that this approach improves robustness of dynamic scene reconstruction to erroneous calibration and large motions, and discuss the benefits and limitations of integrating RGB+ToF sensors that are now available on modern smartphones.