ReFiNe: Recursive Field Networks for Cross-modal Multi-scene Representation
This addresses the trade-off between accuracy and memory in multi-scene representation for applications like raytracing, though it appears incremental as it builds on existing neural field methods.
The paper tackled the problem of multi-shape representation by developing a method to encode multiple shapes as continuous neural fields with higher precision and lower memory usage than prior approaches, achieving state-of-the-art results in reconstruction and compression across diverse datasets.
The common trade-offs of state-of-the-art methods for multi-shape representation (a single model "packing" multiple objects) involve trading modeling accuracy against memory and storage. We show how to encode multiple shapes represented as continuous neural fields with a higher degree of precision than previously possible and with low memory usage. Key to our approach is a recursive hierarchical formulation that exploits object self-similarity, leading to a highly compressed and efficient shape latent space. Thanks to the recursive formulation, our method supports spatial and global-to-local latent feature fusion without needing to initialize and maintain auxiliary data structures, while still allowing for continuous field queries to enable applications such as raytracing. In experiments on a set of diverse datasets, we provide compelling qualitative results and demonstrate state-of-the-art multi-scene reconstruction and compression results with a single network per dataset.