VQT-Light:Lightweight HDR Illumination Map Prediction with Richer Texture.pdf
This addresses the challenge of detailed texture restoration and speed in illumination map prediction for applications in computer vision and graphics, representing an incremental advancement over existing methods.
The paper tackles the problem of accurate lighting estimation in computer vision and graphics by proposing VQT-Light, a framework that predicts high-dynamic-range illumination maps with richer texture and better fidelity while maintaining lightweight and fast performance, achieving an inference speed of 40FPS and improvements in multiple evaluation metrics.
Accurate lighting estimation is a significant yet challenging task in computer vision and graphics. However, existing methods either struggle to restore detailed textures of illumination map, or face challenges in running speed and texture fidelity. To tackle this problem, we propose a novel framework (VQT-Light) based on VQVAE and ViT architecture. VQT-Light includes two modules: feature extraction and lighting estimation. First, we take advantages of VQVAE to extract discrete features of illumination map rather than continuous features to avoid "posterior collapse". Second, we capture global context and dependencies of input image through ViT rather than CNNs to improve the prediction of illumination outside the field of view. Combining the above two modules, we formulate the lighting estimation as a multiclass classification task, which plays a key role in our pipeline. As a result, our model predicts light map with richer texture and better fidelity while keeping lightweight and fast. VQT-Light achieves an inference speed of 40FPS and improves multiple evaluation metrics. Qualitative and quantitative experiments demonstrate that the proposed method realizes superior results compared to existing state-of-the-art methods.