CLIP-Guided Unsupervised Semantic-Aware Exposure Correction
This work addresses exposure correction for image processing applications, offering an unsupervised approach to avoid manual labeling, but it is incremental as it builds on existing models like CLIP and FastSAM.
The paper tackled the problem of exposure correction in images, which suffers from color shift artifacts and lack of ground-truth labels, by proposing an unsupervised semantic-aware network that uses CLIP and FastSAM for guidance, achieving state-of-the-art results in correcting real-world exposure images.
Improper exposure often leads to severe loss of details, color distortion, and reduced contrast. Exposure correction still faces two critical challenges: (1) the ignorance of object-wise regional semantic information causes the color shift artifacts; (2) real-world exposure images generally have no ground-truth labels, and its labeling entails massive manual editing. To tackle the challenges, we propose a new unsupervised semantic-aware exposure correction network. It contains an adaptive semantic-aware fusion module, which effectively fuses the semantic information extracted from a pre-trained Fast Segment Anything Model into a shared image feature space. Then the fused features are used by our multi-scale residual spatial mamba group to restore the details and adjust the exposure. To avoid manual editing, we propose a pseudo-ground truth generator guided by CLIP, which is fine-tuned to automatically identify exposure situations and instruct the tailored corrections. Also, we leverage the rich priors from the FastSAM and CLIP to develop a semantic-prompt consistency loss to enforce semantic consistency and image-prompt alignment for unsupervised training. Comprehensive experimental results illustrate the effectiveness of our method in correcting real-world exposure images and outperforms state-of-the-art unsupervised methods both numerically and visually.