360U-Former: HDR Illumination Estimation with Panoramic Adapted Vision Transformers
This addresses a specific issue in computer vision for applications like image-based lighting, though it is incremental as it builds on existing transformer adaptations.
The paper tackled the problem of artifacts in high dynamic range illumination estimation from equirectangular panoramas by proposing 360U-Former, a novel U-Net style Vision Transformer adapted to this format, which outperformed state-of-the-art methods and eliminated seams and warping artifacts.
Recent illumination estimation methods have focused on enhancing the resolution and improving the quality and diversity of the generated textures. However, few have explored tailoring the neural network architecture to the Equirectangular Panorama (ERP) format utilised in image-based lighting. Consequently, high dynamic range images (HDRI) results usually exhibit a seam at the side borders and textures or objects that are warped at the poles. To address this shortcoming we propose a novel architecture, 360U-Former, based on a U-Net style Vision-Transformer which leverages the work of PanoSWIN, an adapted shifted window attention tailored to the ERP format. To the best of our knowledge, this is the first purely Vision-Transformer model used in the field of illumination estimation. We train 360U-Former as a GAN to generate HDRI from a limited field of view low dynamic range image (LDRI). We evaluate our method using current illumination estimation evaluation protocols and datasets, demonstrating that our approach outperforms existing and state-of-the-art methods without the artefacts typically associated with the use of the ERP format.