LDP: Language-driven Dual-Pixel Image Defocus Deblurring Network
This addresses image deblurring for photography and computational imaging applications, representing an incremental advance by integrating language models into an existing task.
The paper tackles the problem of recovering sharp images from dual-pixel pairs with disparity-dependent blur by proposing a framework that uses CLIP to estimate blur maps unsupervisedly, achieving state-of-the-art performance in experiments.
Recovering sharp images from dual-pixel (DP) pairs with disparity-dependent blur is a challenging task.~Existing blur map-based deblurring methods have demonstrated promising results. In this paper, we propose, to the best of our knowledge, the first framework that introduces the contrastive language-image pre-training framework (CLIP) to accurately estimate the blur map from a DP pair unsupervisedly. To achieve this, we first carefully design text prompts to enable CLIP to understand blur-related geometric prior knowledge from the DP pair. Then, we propose a format to input a stereo DP pair to CLIP without any fine-tuning, despite the fact that CLIP is pre-trained on monocular images. Given the estimated blur map, we introduce a blur-prior attention block, a blur-weighting loss, and a blur-aware loss to recover the all-in-focus image. Our method achieves state-of-the-art performance in extensive experiments (see Fig.~\ref{fig:teaser}).