Language Prompt vs. Image Enhancement: Boosting Object Detection With CLIP in Hazy Environments
For object detection in hazy environments, this work offers a novel approach that avoids unstable image enhancement modules, but the improvement is incremental as it relies on existing CLIP and loss function modifications.
Object detection in hazy environments is challenging due to weakened semantics. The authors propose using language prompts with CLIP to enhance semantics without image enhancement, achieving state-of-the-art performance on a new synthetic hazy dataset (HazyCOCO).
Object detection in hazy environments is challenging because degraded objects are nearly invisible and their semantics are weakened by environmental noise, making it difficult for detectors to identify. Common approaches involve image enhancement to boost weakened semantics, but these methods are limited by the instability of enhanced modules. This paper proposes a novel solution by employing language prompts to enhance weakened semantics without image enhancement. Specifically, we design Approximation of Mutual Exclusion (AME) to provide credible weights for Cross-Entropy Loss, resulting in CLIP-guided Cross-Entropy Loss (CLIP-CE). The provided weights assess the semantic weakening of objects. Through the backpropagation of CLIP-CE, weakened semantics are enhanced, making degraded objects easier to detect. In addition, we present Fine-tuned AME (FAME) which adaptively fine-tunes the weight of AME based on the predicted confidence. The proposed FAME compensates for the imbalanced optimization in AME. Furthermore, we present HazyCOCO, a large-scale synthetic hazy dataset comprising 61258 images. Experimental results demonstrate that our method achieves state-of-the-art performance. The code and dataset will be released.