Learning Spatially Collaged Fourier Bases for Implicit Neural Representation
This work addresses the issue of artifacts in INR for tasks like image fitting, video representation, and 3D shape representation, offering a novel method that is incremental but provides strong specific gains.
The paper tackles the problem of limited representation capability in Implicit Neural Representation (INR) due to universal Fourier bases, which cause artifacts in local regions, by introducing learnable spatial masks to dispatch distinct Fourier bases into respective regions, resulting in superior reconstruction quality with improvements such as over 3dB PSNR in image fitting and 98.81 IoU in 3D reconstruction.
Existing approaches to Implicit Neural Representation (INR) can be interpreted as a global scene representation via a linear combination of Fourier bases of different frequencies. However, such universal basis functions can limit the representation capability in local regions where a specific component is unnecessary, resulting in unpleasant artifacts. To this end, we introduce a learnable spatial mask that effectively dispatches distinct Fourier bases into respective regions. This translates into collaging Fourier patches, thus enabling an accurate representation of complex signals. Comprehensive experiments demonstrate the superior reconstruction quality of the proposed approach over existing baselines across various INR tasks, including image fitting, video representation, and 3D shape representation. Our method outperforms all other baselines, improving the image fitting PSNR by over 3dB and 3D reconstruction to 98.81 IoU and 0.0011 Chamfer Distance.