Erkang Chen

CV
16papers
560citations
Novelty52%
AI Score32

16 Papers

CVMar 21, 2022Code
Underwater Light Field Retention : Neural Rendering for Underwater Imaging

Tian Ye, Sixiang Chen, Yun Liu et al.

Underwater Image Rendering aims to generate a true-tolife underwater image from a given clean one, which could be applied to various practical applications such as underwater image enhancement, camera filter, and virtual gaming. We explore two less-touched but challenging problems in underwater image rendering, namely, i) how to render diverse underwater scenes by a single neural network? ii) how to adaptively learn the underwater light fields from natural exemplars, i,e., realistic underwater images? To this end, we propose a neural rendering method for underwater imaging, dubbed UWNR (Underwater Neural Rendering). Specifically, UWNR is a data-driven neural network that implicitly learns the natural degenerated model from authentic underwater images, avoiding introducing erroneous biases by hand-craft imaging models. Compared with existing underwater image generation methods, UWNR utilizes the natural light field to simulate the main characteristics ofthe underwater scene. Thus, it is able to synthesize a wide variety ofunderwater images from one clean image with various realistic underwater images. Extensive experiments demonstrate that our approach achieves better visual effects and quantitative metrics over previous methods. Moreover, we adopt UWNR to build an open Large Neural Rendering Underwater Dataset containing various types of water quality, dubbed LNRUD. The source code and LNRUD are available at https: //github.com/Ephemeral182/UWNR.

CVAug 20, 2022Code
SnowFormer: Context Interaction Transformer with Scale-awareness for Single Image Desnowing

Sixiang Chen, Tian Ye, Yun Liu et al.

Due to various and complicated snow degradations, single image desnowing is a challenging image restoration task. As prior arts can not handle it ideally, we propose a novel transformer, SnowFormer, which explores efficient cross-attentions to build local-global context interaction across patches and surpasses existing works that employ local operators or vanilla transformers. Compared to prior desnowing methods and universal image restoration methods, SnowFormer has several benefits. Firstly, unlike the multi-head self-attention in recent image restoration Vision Transformers, SnowFormer incorporates the multi-head cross-attention mechanism to perform local-global context interaction between scale-aware snow queries and local-patch embeddings. Second, the snow queries in SnowFormer are generated by the query generator from aggregated scale-aware features, which are rich in potential clean cues, leading to superior restoration results. Third, SnowFormer outshines advanced state-of-the-art desnowing networks and the prevalent universal image restoration transformers on six synthetic and real-world datasets. The code is released in \url{https://github.com/Ephemeral182/SnowFormer}.

CVJul 12, 2022
MSP-Former: Multi-Scale Projection Transformer for Single Image Desnowing

Sixiang Chen, Tian Ye, Yun Liu et al.

Snow removal causes challenges due to its characteristic of complex degradations. To this end, targeted treatment of multi-scale snow degradations is critical for the network to learn effective snow removal. In order to handle the diverse scenes, we propose a multi-scale projection transformer (MSP-Former), which understands and covers a variety of snow degradation features in a multi-path manner, and integrates comprehensive scene context information for clean reconstruction via self-attention operation. For the local details of various snow degradations, the local capture module is introduced in parallel to assist in the rebuilding of a clean image. Such design achieves the SOTA performance on three desnowing benchmark datasets while costing the low parameters and computational complexity, providing a guarantee of practicality.

CVAug 27, 2023
Sparse Sampling Transformer with Uncertainty-Driven Ranking for Unified Removal of Raindrops and Rain Streaks

Sixiang Chen, Tian Ye, Jinbin Bai et al.

In the real world, image degradations caused by rain often exhibit a combination of rain streaks and raindrops, thereby increasing the challenges of recovering the underlying clean image. Note that the rain streaks and raindrops have diverse shapes, sizes, and locations in the captured image, and thus modeling the correlation relationship between irregular degradations caused by rain artifacts is a necessary prerequisite for image deraining. This paper aims to present an efficient and flexible mechanism to learn and model degradation relationships in a global view, thereby achieving a unified removal of intricate rain scenes. To do so, we propose a Sparse Sampling Transformer based on Uncertainty-Driven Ranking, dubbed UDR-S2Former. Compared to previous methods, our UDR-S2Former has three merits. First, it can adaptively sample relevant image degradation information to model underlying degradation relationships. Second, explicit application of the uncertainty-driven ranking strategy can facilitate the network to attend to degradation features and understand the reconstruction process. Finally, experimental results show that our UDR-S2Former clearly outperforms state-of-the-art methods for all benchmarks.

CVOct 3, 2022
Dual-former: Hybrid Self-attention Transformer for Efficient Image Restoration

Sixiang Chen, Tian Ye, Yun Liu et al.

Recently, image restoration transformers have achieved comparable performance with previous state-of-the-art CNNs. However, how to efficiently leverage such architectures remains an open problem. In this work, we present Dual-former whose critical insight is to combine the powerful global modeling ability of self-attention modules and the local modeling ability of convolutions in an overall architecture. With convolution-based Local Feature Extraction modules equipped in the encoder and the decoder, we only adopt a novel Hybrid Transformer Block in the latent layer to model the long-distance dependence in spatial dimensions and handle the uneven distribution between channels. Such a design eliminates the substantial computational complexity in previous image restoration transformers and achieves superior performance on multiple image restoration tasks. Experiments demonstrate that Dual-former achieves a 1.91dB gain over the state-of-the-art MAXIM method on the Indoor dataset for single image dehazing while consuming only 4.2% GFLOPs as MAXIM. For single image deraining, it exceeds the SOTA method by 0.1dB PSNR on the average results of five datasets with only 21.5% GFLOPs. Dual-former also substantially surpasses the latest desnowing method on various datasets, with fewer parameters.

CVJul 12, 2022
Towards Real-time High-Definition Image Snow Removal: Efficient Pyramid Network with Asymmetrical Encoder-decoder Architecture

Tian Ye, Sixiang Chen, Yun Liu et al.

In winter scenes, the degradation of images taken under snow can be pretty complex, where the spatial distribution of snowy degradation is varied from image to image. Recent methods adopt deep neural networks to directly recover clean scenes from snowy images. However, due to the paradox caused by the variation of complex snowy degradation, achieving reliable High-Definition image desnowing performance in real time is a considerable challenge. We develop a novel Efficient Pyramid Network with asymmetrical encoder-decoder architecture for real-time HD image desnowing. The general idea of our proposed network is to utilize the multi-scale feature flow fully and implicitly mine clean cues from features. Compared with previous state-of-the-art desnowing methods, our approach achieves a better complexity-performance trade-off and effectively handles the processing difficulties of HD and Ultra-HD images. The extensive experiments on three large-scale image desnowing datasets demonstrate that our method surpasses all state-of-the-art approaches by a large margin both quantitatively and qualitatively, boosting the PSNR metric from 31.76 dB to 34.10 dB on the CSD test dataset and from 28.29 dB to 30.87 dB on the SRRS test dataset.

CVMar 13, 2023
DEHRFormer: Real-time Transformer for Depth Estimation and Haze Removal from Varicolored Haze Scenes

Sixiang Chen, Tian Ye, Jun Shi et al.

Varicolored haze caused by chromatic casts poses haze removal and depth estimation challenges. Recent learning-based depth estimation methods are mainly targeted at dehazing first and estimating depth subsequently from haze-free scenes. This way, the inner connections between colored haze and scene depth are lost. In this paper, we propose a real-time transformer for simultaneous single image Depth Estimation and Haze Removal (DEHRFormer). DEHRFormer consists of a single encoder and two task-specific decoders. The transformer decoders with learnable queries are designed to decode coupling features from the task-agnostic encoder and project them into clean image and depth map, respectively. In addition, we introduce a novel learning paradigm that utilizes contrastive learning and domain consistency learning to tackle weak-generalization problem for real-world dehazing, while predicting the same depth map from the same scene with varicolored haze. Experiments demonstrate that DEHRFormer achieves significant performance improvement across diverse varicolored haze scenes over previous depth estimation networks and dehazing approaches.

CVJul 16, 2024
Haze-Aware Attention Network for Single-Image Dehazing

Lihan Tong, Yun Liu, Weijia Li et al.

Single-image dehazing is a pivotal challenge in computer vision that seeks to remove haze from images and restore clean background details. Recognizing the limitations of traditional physical model-based methods and the inefficiencies of current attention-based solutions, we propose a new dehazing network combining an innovative Haze-Aware Attention Module (HAAM) with a Multiscale Frequency Enhancement Module (MFEM). The HAAM is inspired by the atmospheric scattering model, thus skillfully integrating physical principles into high-dimensional features for targeted dehazing. It picks up on latent features during the image restoration process, which gives a significant boost to the metrics, while the MFEM efficiently enhances high-frequency details, thus sidestepping wavelet or Fourier transform complexities. It employs multiscale fields to extract and emphasize key frequency components with minimal parameter overhead. Integrated into a simple U-Net framework, our Haze-Aware Attention Network (HAA-Net) for single-image dehazing significantly outperforms existing attention-based and transformer models in efficiency and effectiveness. Tested across various public datasets, the HAA-Net sets new performance benchmarks. Our work not only advances the field of image dehazing but also offers insights into the design of attention mechanisms for broader applications in computer vision.

CVApr 19, 2022
Towards Efficient Single Image Dehazing and Desnowing

Tian Ye, Sixiang Chen, Yun Liu et al.

Removing adverse weather conditions like rain, fog, and snow from images is a challenging problem. Although the current recovery algorithms targeting a specific condition have made impressive progress, it is not flexible enough to deal with various degradation types. We propose an efficient and compact image restoration network named DAN-Net (Degradation-Adaptive Neural Network) to address this problem, which consists of multiple compact expert networks with one adaptive gated neural. A single expert network efficiently addresses specific degradation in nasty winter scenes relying on the compact architecture and three novel components. Based on the Mixture of Experts strategy, DAN-Net captures degradation information from each input image to adaptively modulate the outputs of task-specific expert networks to remove various adverse winter weather conditions. Specifically, it adopts a lightweight Adaptive Gated Neural Network to estimate gated attention maps of the input image, while different task-specific experts with the same topology are jointly dispatched to process the degraded image. Such novel image restoration pipeline handles different types of severe weather scenes effectively and efficiently. It also enjoys the benefit of coordinate boosting in which the whole network outperforms each expert trained without coordination. Extensive experiments demonstrate that the presented manner outperforms the state-of-the-art single-task methods on image quality and has better inference efficiency. Furthermore, we have collected the first real-world winter scenes dataset to evaluate winter image restoration methods, which contains various hazy and snowy images snapped in winter. Both the dataset and source code will be publicly available.

CVMar 17, 2022
Mutual Learning for Domain Adaptation: Self-distillation Image Dehazing Network with Sample-cycle

Tian Ye, Yun Liu, Yunchen Zhang et al.

Deep learning-based methods have made significant achievements for image dehazing. However, most of existing dehazing networks are concentrated on training models using simulated hazy images, resulting in generalization performance degradation when applied on real-world hazy images because of domain shift. In this paper, we propose a mutual learning dehazing framework for domain adaption. Specifically, we first devise two siamese networks: a teacher network in the synthetic domain and a student network in the real domain, and then optimize them in a mutual learning manner by leveraging EMA and joint loss. Moreover, we design a sample-cycle strategy based on density augmentation (HDA) module to introduce pseudo real-world image pairs provided by the student network into training for further improving the generalization performance. Extensive experiments on both synthetic and real-world dataset demonstrate that the propose mutual learning framework outperforms state-of-the-art dehazing techniques in terms of subjective and objective evaluation.

CVFeb 23, 2023
RSFDM-Net: Real-time Spatial and Frequency Domains Modulation Network for Underwater Image Enhancement

Jingxia Jiang, Jinbin Bai, Yun Liu et al.

Underwater images typically experience mixed degradations of brightness and structure caused by the absorption and scattering of light by suspended particles. To address this issue, we propose a Real-time Spatial and Frequency Domains Modulation Network (RSFDM-Net) for the efficient enhancement of colors and details in underwater images. Specifically, our proposed conditional network is designed with Adaptive Fourier Gating Mechanism (AFGM) and Multiscale Convolutional Attention Module (MCAM) to generate vectors carrying low-frequency background information and high-frequency detail features, which effectively promote the network to model global background information and local texture details. To more precisely correct the color cast and low saturation of the image, we introduce a Three-branch Feature Extraction (TFE) block in the primary net that processes images pixel by pixel to integrate the color information extended by the same channel (R, G, or B). This block consists of three small branches, each of which has its own weights. Extensive experiments demonstrate that our network significantly outperforms over state-of-the-art methods in both visual quality and quantitative metrics.

CVMay 15, 2023Code
Five A$^{+}$ Network: You Only Need 9K Parameters for Underwater Image Enhancement

Jingxia Jiang, Tian Ye, Jinbin Bai et al.

A lightweight underwater image enhancement network is of great significance for resource-constrained platforms, but balancing model size, computational efficiency, and enhancement performance has proven difficult for previous approaches. In this work, we propose the Five A$^{+}$ Network (FA$^{+}$Net), a highly efficient and lightweight real-time underwater image enhancement network with only $\sim$ 9k parameters and $\sim$ 0.01s processing time. The FA$^{+}$Net employs a two-stage enhancement structure. The strong prior stage aims to decompose challenging underwater degradations into sub-problems, while the fine-grained stage incorporates multi-branch color enhancement module and pixel attention module to amplify the network's perception of details. To the best of our knowledge, FA$^{+}$Net is the only network with the capability of real-time enhancement of 1080P images. Thorough extensive experiments and comprehensive visual comparison, we show that FA$^{+}$Net outperforms previous approaches by obtaining state-of-the-art performance on multiple datasets while significantly reducing both parameter count and computational complexity. The code is open source at https://github.com/Owen718/FiveAPlus-Network.

CVMay 9, 2024
Parallel Cross Strip Attention Network for Single Image Dehazing

Lihan Tong, Yun Liu, Tian Ye et al.

The objective of single image dehazing is to restore hazy images and produce clear, high-quality visuals. Traditional convolutional models struggle with long-range dependencies due to their limited receptive field size. While Transformers excel at capturing such dependencies, their quadratic computational complexity in relation to feature map resolution makes them less suitable for pixel-to-pixel dense prediction tasks. Moreover, fixed kernels or tokens in most models do not adapt well to varying blur sizes, resulting in suboptimal dehazing performance. In this study, we introduce a novel dehazing network based on Parallel Stripe Cross Attention (PCSA) with a multi-scale strategy. PCSA efficiently integrates long-range dependencies by simultaneously capturing horizontal and vertical relationships, allowing each pixel to capture contextual cues from an expanded spatial domain. To handle different sizes and shapes of blurs flexibly, We employs a channel-wise design with varying convolutional kernel sizes and strip lengths in each PCSA to capture context information at different scales.Additionally, we incorporate a softmax-based adaptive weighting mechanism within PCSA to prioritize and leverage more critical features.

CVMay 16, 2023
NightHazeFormer: Single Nighttime Haze Removal Using Prior Query Transformer

Yun Liu, Zhongsheng Yan, Sixiang Chen et al.

Nighttime image dehazing is a challenging task due to the presence of multiple types of adverse degrading effects including glow, haze, blurry, noise, color distortion, and so on. However, most previous studies mainly focus on daytime image dehazing or partial degradations presented in nighttime hazy scenes, which may lead to unsatisfactory restoration results. In this paper, we propose an end-to-end transformer-based framework for nighttime haze removal, called NightHazeFormer. Our proposed approach consists of two stages: supervised pre-training and semi-supervised fine-tuning. During the pre-training stage, we introduce two powerful priors into the transformer decoder to generate the non-learnable prior queries, which guide the model to extract specific degradations. For the fine-tuning, we combine the generated pseudo ground truths with input real-world nighttime hazy images as paired images and feed into the synthetic domain to fine-tune the pre-trained model. This semi-supervised fine-tuning paradigm helps improve the generalization to real domain. In addition, we also propose a large-scale synthetic dataset called UNREAL-NH, to simulate the real-world nighttime haze scenarios comprehensively. Extensive experiments on several synthetic and real-world datasets demonstrate the superiority of our NightHazeFormer over state-of-the-art nighttime haze removal methods in terms of both visually and quantitatively.

CVNov 18, 2021
Perceiving and Modeling Density is All You Need for Image Dehazing

Tian Ye, Mingchao Jiang, Yunchen Zhang et al.

In the real world, the degradation of images taken under haze can be quite complex, where the spatial distribution of haze is varied from image to image. Recent methods adopt deep neural networks to recover clean scenes from hazy images directly. However, due to the paradox caused by the variation of real captured haze and the fixed degradation parameters of the current networks, the generalization ability of recent dehazing methods on real-world hazy images is not ideal.To address the problem of modeling real-world haze degradation, we propose to solve this problem by perceiving and modeling density for uneven haze distribution. We propose a novel Separable Hybrid Attention (SHA) module to encode haze density by capturing features in the orthogonal directions to achieve this goal. Moreover, a density map is proposed to model the uneven distribution of the haze explicitly. The density map generates positional encoding in a semi-supervised way. Such a haze density perceiving and modeling capture the unevenly distributed degeneration at the feature level effectively. Through a suitable combination of SHA and density map, we design a novel dehazing network architecture, which achieves a good complexity-performance trade-off. The extensive experiments on two large-scale datasets demonstrate that our method surpasses all state-of-the-art approaches by a large margin both quantitatively and qualitatively, boosting the best published PSNR metric from 28.53 dB to 33.49 dB on the Haze4k test dataset and from 37.17 dB to 38.41 dB on the SOTS indoor test dataset.

IVSep 12, 2021
Efficient Re-parameterization Residual Attention Network For Nonhomogeneous Image Dehazing

Tian Ye, ErKang Chen, XinRui Huang et al.

This paper proposes an end-to-end Efficient Re-parameterizationResidual Attention Network(ERRA-Net) to directly restore the nonhomogeneous hazy image. The contribution of this paper mainly has the following three aspects: 1) A novel Multi-branch Attention (MA) block. The spatial attention mechanism better reconstructs high-frequency features, and the channel attention mechanism treats the features of different channels differently. Multi-branch structure dramatically improves the representation ability of the model and can be changed into a single path structure after re-parameterization to speed up the process of inference. Local Residual Connection allows the low-frequency information in the nonhomogeneous area to pass through the block without processing so that the block can focus on detailed features. 2) A lightweight network structure. We use cascaded MA blocks to extract high-frequency features step by step, and the Multi-layer attention fusion tail combines the shallow and deep features of the model to get the residual of the clean image finally. 3)We propose two novel loss functions to help reconstruct the hazy image ColorAttenuation loss and Laplace Pyramid loss. ERRA-Net has an impressive speed, processing 1200x1600 HD quality images with an average runtime of 166.11 fps. Extensive evaluations demonstrate that ERSANet performs favorably against the SOTA approaches on the real-world hazy images.