Tell Me What You See: Text-Guided Real-World Image Denoising
This addresses the problem of insufficient denoising in low-light photography for photographers, offering a novel way to leverage user-provided text.
The paper tackles image denoising in extremely low-light conditions by using text-based scene descriptions as an additional prior, showing that this approach significantly improves denoising and reconstruction for synthetic and real-world images.
Image reconstruction from noisy sensor measurements is challenging and many methods have been proposed for it. Yet, most approaches focus on learning robust natural image priors while modeling the scene's noise statistics. In extremely low-light conditions, these methods often remain insufficient. Additional information is needed, such as multiple captures or, as suggested here, scene description. As an alternative, we propose using a text-based description of the scene as an additional prior, something the photographer can easily provide. Inspired by the remarkable success of text-guided diffusion models in image generation, we show that adding image caption information significantly improves image denoising and reconstruction for both synthetic and real-world images.