CVMar 7

Retinex Meets Language: A Physics-Semantics-Guided Underwater Image Enhancement Network

Shixuan Xu, Yabo Liu, Junyu Dong, Xinghui Dong

arXiv:2603.07076v17.0h-index: 17

Predicted impact top 74% in CV · last 90 daysOriginality Highly original

AI Analysis

This work addresses the problem of enhancing degraded underwater images for various applications by introducing textual guidance and a multimodal dataset, which is a novel approach for the UIE task.

Underwater images suffer from degradation due to light absorption and scattering. This paper introduces PSG-UIENet, a network that combines Retinex-based illumination correction with language-informed guidance to enhance underwater images. It achieves superior or comparable performance against fifteen state-of-the-art methods on both a newly constructed dataset and four public datasets.

Underwater images often suffer from severe degradation caused by light absorption and scattering, leading to color distortion, low contrast and reduced visibility. Existing Underwater Image Enhancement (UIE) methods can be divided into two categories, i.e., prior-based and learning-based methods. The former rely on rigid physical assumptions that limit the adaptability, while the latter often face data scarcity and weak generalization. To address these issues, we propose a Physics-Semantics-Guided Underwater Image Enhancement Network (PSG-UIENet), which couples the Retinex-grounded illumination correction with the language-informed guidance. This network comprises a Prior-Free Illumination Estimator, a Cross-Modal Text Aligner and a Semantics-Guided Image Restorer. In particular, the restorer leverages the textual descriptions generated by the Contrastive Language-Image Pre-training (CLIP) model to inject high-level semantics for perceptually meaningful guidance. Since multimodal UIE data sets are not publicly available, we also construct a large-scale image-text UIE data set, namely, LUIQD-TD, which contains 6,418 image-reference-text triplets. To explicitly measure and optimize semantic consistency between textual descriptions and images, we further design an Image-Text Semantic Similarity (ITSS) loss function. To our knowledge, this study makes the first effort to introduce both textual guidance and the multimodal data set into UIE tasks. Extensive experiments on our data set and four publicly available data sets demonstrate that the proposed PSG-UIENet achieves superior or comparable performance against fifteen state-of-the-art methods.

View on arXiv PDF

Similar