CVJun 2, 2025

CLIP-driven rain perception: Adaptive deraining with pattern-aware network routing and mask-guided cross-attention

arXiv:2506.01366v11 citationsh-index: 1Pattern Recognition
Originality Incremental advance
AI Analysis

This addresses image deraining for computer vision applications, offering an incremental improvement through adaptive routing and attention mechanisms.

The paper tackles the problem of removing rain from images by proposing CLIP-RPN, a network that adaptively routes different rain patterns to specialized sub-networks using CLIP's visual-language matching, achieving state-of-the-art performance on multiple datasets.

Existing deraining models process all rainy images within a single network. However, different rain patterns have significant variations, which makes it challenging for a single network to handle diverse types of raindrops and streaks. To address this limitation, we propose a novel CLIP-driven rain perception network (CLIP-RPN) that leverages CLIP to automatically perceive rain patterns by computing visual-language matching scores and adaptively routing to sub-networks to handle different rain patterns, such as varying raindrop densities, streak orientations, and rainfall intensity. CLIP-RPN establishes semantic-aware rain pattern recognition through CLIP's cross-modal visual-language alignment capabilities, enabling automatic identification of precipitation characteristics across different rain scenarios. This rain pattern awareness drives an adaptive subnetwork routing mechanism where specialized processing branches are dynamically activated based on the detected rain type, significantly enhancing the model's capacity to handle diverse rainfall conditions. Furthermore, within sub-networks of CLIP-RPN, we introduce a mask-guided cross-attention mechanism (MGCA) that predicts precise rain masks at multi-scale to facilitate contextual interactions between rainy regions and clean background areas by cross-attention. We also introduces a dynamic loss scheduling mechanism (DLS) to adaptively adjust the gradients for the optimization process of CLIP-RPN. Compared with the commonly used $l_1$ or $l_2$ loss, DLS is more compatible with the inherent dynamics of the network training process, thus achieving enhanced outcomes. Our method achieves state-of-the-art performance across multiple datasets, particularly excelling in complex mixed datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes