Yongzhen Wang

h-index37

25papers

2,467citations

Novelty54%

AI Score62

Ranked #2,893 of 201,326 authors (top 1%)#1,476 in CV (top 3%)

25 Papers

CVMay 4, 2022Code

UCL-Dehaze: Towards Real-world Image Dehazing via Unsupervised Contrastive Learning

Yongzhen Wang, Xuefeng Yan, Fu Lee Wang et al.

While the wisdom of training an image dehazing model on synthetic hazy data can alleviate the difficulty of collecting real-world hazy/clean image pairs, it brings the well-known domain shift problem. From a different yet new perspective, this paper explores contrastive learning with an adversarial training effort to leverage unpaired real-world hazy and clean images, thus bridging the gap between synthetic and real-world haze is avoided. We propose an effective unsupervised contrastive learning paradigm for image dehazing, dubbed UCL-Dehaze. Unpaired real-world clean and hazy images are easily captured, and will serve as the important positive and negative samples respectively when training our UCL-Dehaze network. To train the network more effectively, we formulate a new self-contrastive perceptual loss function, which encourages the restored images to approach the positive samples and keep away from the negative samples in the embedding space. Besides the overall network architecture of UCL-Dehaze, adversarial training is utilized to align the distributions between the positive samples and the dehazed images. Compared with recent image dehazing works, UCL-Dehaze does not require paired data during training and utilizes unpaired positive/negative data to better enhance the dehazing performance. We conduct comprehensive experiments to evaluate our UCL-Dehaze and demonstrate its superiority over the state-of-the-arts, even only 1,800 unpaired real-world images are used to train our network. Source code has been available at https://github.com/yz-wang/UCL-Dehaze.

CVSep 3, 2022Code

TogetherNet: Bridging Image Restoration and Object Detection Together via Dynamic Enhancement Learning

Yongzhen Wang, Xuefeng Yan, Kaiwen Zhang et al.

Adverse weather conditions such as haze, rain, and snow often impair the quality of captured images, causing detection networks trained on normal images to generalize poorly in these scenarios. In this paper, we raise an intriguing question - if the combination of image restoration and object detection, can boost the performance of cutting-edge detectors in adverse weather conditions. To answer it, we propose an effective yet unified detection paradigm that bridges these two subtasks together via dynamic enhancement learning to discern objects in adverse weather conditions, called TogetherNet. Different from existing efforts that intuitively apply image dehazing/deraining as a pre-processing step, TogetherNet considers a multi-task joint learning problem. Following the joint learning scheme, clean features produced by the restoration network can be shared to learn better object detection in the detection network, thus helping TogetherNet enhance the detection capacity in adverse weather conditions. Besides the joint learning architecture, we design a new Dynamic Transformer Feature Enhancement module to improve the feature extraction and representation capabilities of TogetherNet. Extensive experiments on both synthetic and real-world datasets demonstrate that our TogetherNet outperforms the state-of-the-art detection approaches by a large margin both quantitatively and qualitatively. Source code is available at https://github.com/yz-wang/TogetherNet.

CVApr 6, 2022Code

Detail-recovery Image Deraining via Dual Sample-augmented Contrastive Learning

Yiyang Shen, Mingqiang Wei, Sen Deng et al.

The intricacy of rainy image contents often leads cutting-edge deraining models to image degradation including remnant rain, wrongly-removed details, and distorted appearance. Such degradation is further exacerbated when applying the models trained on synthetic data to real-world rainy images. We observe two types of domain gaps between synthetic and real-world rainy images: one exists in rain streak patterns; the other is the pixel-level appearance of rain-free images. To bridge the two domain gaps, we propose a semi-supervised detail-recovery image deraining network (Semi-DRDNet) with dual sample-augmented contrastive learning. Semi-DRDNet consists of three sub-networks:i) for removing rain streaks without remnants, we present a squeeze-and-excitation based rain residual network; ii) for encouraging the lost details to return, we construct a structure detail context aggregation based detail repair network; to our knowledge, this is the first time; and iii) for building efficient contrastive constraints for both rain streaks and clean backgrounds, we exploit a novel dual sample-augmented contrastive regularization network.Semi-DRDNet operates smoothly on both synthetic and real-world rainy data in terms of deraining robustness and detail accuracy. Comparisons on four datasets including our established Real200 show clear improvements of Semi-DRDNet over fifteen state-of-the-art methods. Code and dataset are available at https://github.com/syy-whu/DRD-Net.

CVSep 2, 2022Code

Contrastive Semantic-Guided Image Smoothing Network

Jie Wang, Yongzhen Wang, Yidan Feng et al.

Image smoothing is a fundamental low-level vision task that aims to preserve salient structures of an image while removing insignificant details. Deep learning has been explored in image smoothing to deal with the complex entanglement of semantic structures and trivial details. However, current methods neglect two important facts in smoothing: 1) naive pixel-level regression supervised by the limited number of high-quality smoothing ground-truth could lead to domain shift and cause generalization problems towards real-world images; 2) texture appearance is closely related to object semantics, so that image smoothing requires awareness of semantic difference to apply adaptive smoothing strengths. To address these issues, we propose a novel Contrastive Semantic-Guided Image Smoothing Network (CSGIS-Net) that combines both contrastive prior and semantic prior to facilitate robust image smoothing. The supervision signal is augmented by leveraging undesired smoothing effects as negative teachers, and by incorporating segmentation tasks to encourage semantic distinctiveness. To realize the proposed network, we also enrich the original VOC dataset with texture enhancement and smoothing labels, namely VOC-smooth, which first bridges image smoothing and semantic segmentation. Extensive experiments demonstrate that the proposed CSGIS-Net outperforms state-of-the-art algorithms by a large margin. Code and dataset are available at https://github.com/wangjie6866/CSGIS-Net.

62.1CVMay 27Code

REVEAL: Reference-Grounded Reasoning for Multimodal Manipulation Detection

Jun Zhou, Bingwen Hu, Yaxiong Wang et al.

Multimodal manipulation detection aims to simultaneously identify forged image--text pairs and localize tampered regions, yet existing methods typically rely on memorizing isolated artifacts and struggle with imperceptible manipulation traces or domain shifts. Inspired by human comparative reasoning, we reformulate this task as a reference-grounded verification problem, where authenticity is assessed by comparing a query against retrieved authentic evidence. We propose REVEAL Reference-Enabled Verification for Evidence Analysis and Localization), a framework explicitly designed for this comparative paradigm. To support this paradigm, we construct a large-scale reference library comprising 170K authentic news image--text pairs featuring over 40K public figures. Technically, REVEAL employs a difference-aware fusion mechanism to capture fine-grained discrepancies between the query and retrieved evidence. Furthermore, we introduce a task-decoupled Mixture-of-Experts (MoE) architecture to jointly execute instance-level detection and fine-grained grounding, effectively mitigating optimization conflicts between these heterogeneous objectives. Extensive experiments demonstrate that REVEAL significantly outperforms state-of-the-art methods, and notably enables \emph{training-free domain adaptation} by simply updating the reference library, offering a robust and practical solution for detecting evolving misinformation. Code is available at https://anonymous.4open.science/r/REVEAL-Reference-A006.

CVMar 31, 2023Code

Joint Depth Estimation and Mixture of Rain Removal From a Single Image

Yongzhen Wang, Xuefeng Yan, Yanbiao Niu et al.

Rainy weather significantly deteriorates the visibility of scene objects, particularly when images are captured through outdoor camera lenses or windshields. Through careful observation of numerous rainy photos, we have found that the images are generally affected by various rainwater artifacts such as raindrops, rain streaks, and rainy haze, which impact the image quality from both near and far distances, resulting in a complex and intertwined process of image degradation. However, current deraining techniques are limited in their ability to address only one or two types of rainwater, which poses a challenge in removing the mixture of rain (MOR). In this study, we propose an effective image deraining paradigm for Mixture of rain REmoval, called DEMore-Net, which takes full account of the MOR effect. Going beyond the existing deraining wisdom, DEMore-Net is a joint learning paradigm that integrates depth estimation and MOR removal tasks to achieve superior rain removal. The depth information can offer additional meaningful guidance information based on distance, thus better helping DEMore-Net remove different types of rainwater. Moreover, this study explores normalization approaches in image deraining tasks and introduces a new Hybrid Normalization Block (HNB) to enhance the deraining performance of DEMore-Net. Extensive experiments conducted on synthetic datasets and real-world MOR photos fully validate the superiority of the proposed DEMore-Net. Code is available at https://github.com/yz-wang/DEMore-Net.

CVOct 28, 2022

Semi-UFormer: Semi-supervised Uncertainty-aware Transformer for Image Dehazing

Ming Tong, Yongzhen Wang, Peng Cui et al.

Image dehazing is fundamental yet not well-solved in computer vision. Most cutting-edge models are trained in synthetic data, leading to the poor performance on real-world hazy scenarios. Besides, they commonly give deterministic dehazed images while neglecting to mine their uncertainty. To bridge the domain gap and enhance the dehazing performance, we propose a novel semi-supervised uncertainty-aware transformer network, called Semi-UFormer. Semi-UFormer can well leverage both the real-world hazy images and their uncertainty guidance information. Specifically, Semi-UFormer builds itself on the knowledge distillation framework. Such teacher-student networks effectively absorb real-world haze information for quality dehazing. Furthermore, an uncertainty estimation block is introduced into the model to estimate the pixel uncertainty representations, which is then used as a guidance signal to help the student network produce haze-free images more accurately. Extensive experiments demonstrate that Semi-UFormer generalizes well from synthetic to real-world images.

CVOct 29, 2022

iSmallNet: Densely Nested Network with Label Decoupling for Infrared Small Target Detection

Zhiheng Hu, Yongzhen Wang, Peng Li et al.

Small targets are often submerged in cluttered backgrounds of infrared images. Conventional detectors tend to generate false alarms, while CNN-based detectors lose small targets in deep layers. To this end, we propose iSmallNet, a multi-stream densely nested network with label decoupling for infrared small object detection. On the one hand, to fully exploit the shape information of small targets, we decouple the original labeled ground-truth (GT) map into an interior map and a boundary one. The GT map, in collaboration with the two additional maps, tackles the unbalanced distribution of small object boundaries. On the other hand, two key modules are delicately designed and incorporated into the proposed network to boost the overall performance. First, to maintain small targets in deep layers, we develop a multi-scale nested interaction module to explore a wide range of context information. Second, we develop an interior-boundary fusion module to integrate multi-granularity information. Experiments on NUAA-SIRST and NUDT-SIRST clearly show the superiority of iSmallNet over 11 state-of-the-art detectors.

CVJan 23, 2023

Rethinking Real-world Image Deraining via An Unpaired Degradation-Conditioned Diffusion Model

Yiyang Shen, Mingqiang Wei, Yongzhen Wang et al.

Recent diffusion models have exhibited great potential in generative modeling tasks. Part of their success can be attributed to the ability of training stable on huge sets of paired synthetic data. However, adapting these models to real-world image deraining remains difficult for two aspects. First, collecting a large-scale paired real-world clean/rainy dataset is unavailable while regular conditional diffusion models heavily rely on paired data for training. Second, real-world rain usually reflects real-world scenarios with a variety of unknown rain degradation types, which poses a significant challenge for the generative modeling process. To meet these challenges, we propose RainDiff, the first real-world image deraining paradigm based on diffusion models, serving as a new standard bar for real-world image deraining. We address the first challenge by introducing a stable and non-adversarial unpaired cycle-consistent architecture that can be trained, end-to-end, with only unpaired data for supervision; and the second challenge by proposing a degradation-conditioned diffusion model that refines the desired output via a diffusive generative process conditioned by learned priors of multiple rain degradations. Extensive experiments confirm the superiority of our RainDiff over existing unpaired/semi-supervised methods and show its competitive advantages over several fully-supervised ones.

CVApr 28, 2022

Semi-MoreGAN: A New Semi-supervised Generative Adversarial Network for Mixture of Rain Removal

Yiyang Shen, Yongzhen Wang, Mingqiang Wei et al.

Rain is one of the most common weather which can completely degrade the image quality and interfere with the performance of many computer vision tasks, especially under heavy rain conditions. We observe that: (i) rain is a mixture of rain streaks and rainy haze; (ii) the scene depth determines the intensity of rain streaks and the transformation into the rainy haze; (iii) most existing deraining methods are only trained on synthetic rainy images, and hence generalize poorly to the real-world scenes. Motivated by these observations, we propose a new SEMI-supervised Mixture Of rain REmoval Generative Adversarial Network (Semi-MoreGAN), which consists of four key modules: (I) a novel attentional depth prediction network to provide precise depth estimation; (ii) a context feature prediction network composed of several well-designed detailed residual blocks to produce detailed image context features; (iii) a pyramid depth-guided non-local network to effectively integrate the image context with the depth information, and produce the final rain-free images; and (iv) a comprehensive semi-supervised loss function to make the model not limited to synthetic datasets but generalize smoothly to real-world heavy rainy scenes. Extensive experiments show clear improvements of our approach over twenty representative state-of-the-arts on both synthetic and real-world rainy images.

47.8CVApr 13

UHD-GPGNet: UHD Video Denoising via Gaussian-Process-Guided Local Spatio-Temporal Modeling

Weiyuan He, Chen Wu, Pengwen Dai et al.

Ultra-high-definition (UHD) video denoising requires simultaneously suppressing complex spatio-temporal degradations, preserving fine textures and chromatic stability, and maintaining efficient full-resolution 4K deployment. In this paper, we propose UHD-GPGNet, a Gaussian-process-guided local spatio-temporal denoising framework that addresses these requirements jointly. Rather than relying on implicit feature learning alone, the method estimates sparse GP posterior statistics over compact spatio-temporal descriptors to explicitly characterize local degradation response and uncertainty, which then guide adaptive temporal-detail fusion. A structure-color collaborative reconstruction head decouples luminance, chroma, and high-frequency correction, while a heteroscedastic objective and overlap-tiled inference further stabilize optimization and enable memory-bounded 4K deployment. Experiments on UVG and RealisVideo-4K show that UHD-GPGNet achieves competitive restoration fidelity with substantially fewer parameters than existing methods, enables real-time full-resolution 4K inference with significant speedup over the closest quality competitor, and maintains robust performance across a multi-level mixed-degradation schedule.A real-world study on phone-captured 4K video further confirms that the model, trained entirely on synthetic degradation, generalizes to unseen real sensor noise and improves downstream object detection under challenging conditions.

CVJul 19, 2023

Uncertainty-Driven Multi-Scale Feature Fusion Network for Real-time Image Deraining

Ming Tong, Xuefeng Yan, Yongzhen Wang

Visual-based measurement systems are frequently affected by rainy weather due to the degradation caused by rain streaks in captured images, and existing imaging devices struggle to address this issue in real-time. While most efforts leverage deep networks for image deraining and have made progress, their large parameter sizes hinder deployment on resource-constrained devices. Additionally, these data-driven models often produce deterministic results, without considering their inherent epistemic uncertainty, which can lead to undesired reconstruction errors. Well-calibrated uncertainty can help alleviate prediction errors and assist measurement devices in mitigating risks and improving usability. Therefore, we propose an Uncertainty-Driven Multi-Scale Feature Fusion Network (UMFFNet) that learns the probability mapping distribution between paired images to estimate uncertainty. Specifically, we introduce an uncertainty feature fusion block (UFFB) that utilizes uncertainty information to dynamically enhance acquired features and focus on blurry regions obscured by rain streaks, reducing prediction errors. In addition, to further boost the performance of UMFFNet, we fused feature information from multiple scales to guide the network for efficient collaborative rain removal. Extensive experiments demonstrate that UMFFNet achieves significant performance improvements with few parameters, surpassing state-of-the-art image deraining methods.

59.6CVMar 16Code

M2IR: Proactive All-in-One Image Restoration via Mamba-style Modulation and Mixture-of-Experts

Shiwei Wang, Yongzhen Wang, Bingwen Hu et al.

While Transformer-based architectures have dominated recent advances in all-in-one image restoration, they remain fundamentally reactive: propagating degradations rather than proactively suppressing them. In the absence of explicit suppression mechanisms, degraded signals interfere with feature learning, compelling the decoder to balance artifact removal and detail preservation, thereby increasing model complexity and limiting adaptability. To address these challenges, we propose M2IR, a novel restoration framework that proactively regulates degradation propagation during the encoding stage and efficiently eliminates residual degradations during decoding. Specifically, the Mamba-Style Transformer (MST) block performs pixel-wise selective state modulation to mitigate degradations while preserving structural integrity. In parallel, the Adaptive Degradation Expert Collaboration (ADEC) module utilizes degradation-specific experts guided by a DA-CLIP-driven router and complemented by a shared expert to eliminate residual degradations through targeted and cooperative restoration. By integrating the MST block and ADEC module, M2IR transitions from passive reaction to active degradation control, effectively harnessing learned representations to achieve superior generalization, enhanced adaptability, and refined recovery of fine-grained details across diverse all-in-one image restoration benchmarks. Our source codes are available at https://github.com/Im34v/M2IR.

CVMay 15, 2024Code

RSHazeDiff: A Unified Fourier-aware Diffusion Model for Remote Sensing Image Dehazing

Jiamei Xiong, Xuefeng Yan, Yongzhen Wang et al.

Haze severely degrades the visual quality of remote sensing images and hampers the performance of road extraction, vehicle detection, and traffic flow monitoring. The emerging denoising diffusion probabilistic model (DDPM) exhibits the significant potential for dense haze removal with its strong generation ability. Since remote sensing images contain extensive small-scale texture structures, it is important to effectively restore image details from hazy images. However, current wisdom of DDPM fails to preserve image details and color fidelity well, limiting its dehazing capacity for remote sensing images. In this paper, we propose a novel unified Fourier-aware diffusion model for remote sensing image dehazing, termed RSHazeDiff. From a new perspective, RSHazeDiff explores the conditional DDPM to improve image quality in dense hazy scenarios, and it makes three key contributions. First, RSHazeDiff refines the training phase of diffusion process by performing noise estimation and reconstruction constraints in a coarse-to-fine fashion. Thus, it remedies the unpleasing results caused by the simple noise estimation constraint in DDPM. Second, by taking the frequency information as important prior knowledge during iterative sampling steps, RSHazeDiff can preserve more texture details and color fidelity in dehazed images. Third, we design a global compensated learning module to utilize the Fourier transform to capture the global dependency features of input images, which can effectively mitigate the effects of boundary artifacts when processing fixed-size patches. Experiments on both synthetic and real-world benchmarks validate the favorable performance of RSHazeDiff over state-of-the-art methods. Source code will be released at https://github.com/jm-xiong/RSHazeDiff.

CVMar 7, 2024Code

FriendNet: Detection-Friendly Dehazing Network

Yihua Fan, Yongzhen Wang, Mingqiang Wei et al.

Adverse weather conditions often impair the quality of captured images, inevitably inducing cutting-edge object detection models for advanced driver assistance systems (ADAS) and autonomous driving. In this paper, we raise an intriguing question: can the combination of image restoration and object detection enhance detection performance in adverse weather conditions? To answer it, we propose an effective architecture that bridges image dehazing and object detection together via guidance information and task-driven learning to achieve detection-friendly dehazing, termed FriendNet. FriendNet aims to deliver both high-quality perception and high detection capacity. Different from existing efforts that intuitively treat image dehazing as pre-processing, FriendNet establishes a positive correlation between these two tasks. Clean features generated by the dehazing network potentially contribute to improvements in object detection performance. Conversely, object detection crucially guides the learning process of the image dehazing network under the task-driven learning scheme. We shed light on how downstream tasks can guide upstream dehazing processes, considering both network architecture and learning objectives. We design Guidance Fusion Block (GFB) and Guidance Attention Block (GAB) to facilitate the integration of detection information into the network. Furthermore, the incorporation of the detection task loss aids in refining the optimization process. Additionally, we introduce a new Physics-aware Feature Enhancement Block (PFEB), which integrates physics-based priors to enhance the feature extraction and representation capabilities. Extensive experiments on synthetic and real-world datasets demonstrate the superiority of our method over state-of-the-art methods on both image quality and detection precision. Our source code is available at https://github.com/fanyihua0309/FriendNet.

CVMay 7, 2025Code

WDMamba: When Wavelet Degradation Prior Meets Vision Mamba for Image Dehazing

Jie Sun, Heng Liu, Yongzhen Wang et al.

In this paper, we reveal a novel haze-specific wavelet degradation prior observed through wavelet transform analysis, which shows that haze-related information predominantly resides in low-frequency components. Exploiting this insight, we propose a novel dehazing framework, WDMamba, which decomposes the image dehazing task into two sequential stages: low-frequency restoration followed by detail enhancement. This coarse-to-fine strategy enables WDMamba to effectively capture features specific to each stage of the dehazing process, resulting in high-quality restored images. Specifically, in the low-frequency restoration stage, we integrate Mamba blocks to reconstruct global structures with linear complexity, efficiently removing overall haze and producing a coarse restored image. Thereafter, the detail enhancement stage reinstates fine-grained information that may have been overlooked during the previous phase, culminating in the final dehazed output. Furthermore, to enhance detail retention and achieve more natural dehazing, we introduce a self-guided contrastive regularization during network training. By utilizing the coarse restored output as a hard negative example, our model learns more discriminative representations, substantially boosting the overall dehazing performance. Extensive evaluations on public dehazing benchmarks demonstrate that our method surpasses state-of-the-art approaches both qualitatively and quantitatively. Code is available at https://github.com/SunJ000/WDMamba.

CVJul 1, 2025Code

Laplace-Mamba: Laplace Frequency Prior-Guided Mamba-CNN Fusion Network for Image Dehazing

Yongzhen Wang, Liangliang Chen, Bingwen Hu et al.

Recent progress in image restoration has underscored Spatial State Models (SSMs) as powerful tools for modeling long-range dependencies, owing to their appealing linear complexity and computational efficiency. However, SSM-based approaches exhibit limitations in reconstructing localized structures and tend to be less effective when handling high-dimensional data, frequently resulting in suboptimal recovery of fine image features. To tackle these challenges, we introduce Laplace-Mamba, a novel framework that integrates Laplace frequency prior with a hybrid Mamba-CNN architecture for efficient image dehazing. Leveraging the Laplace decomposition, the image is disentangled into low-frequency components capturing global texture and high-frequency components representing edges and fine details. This decomposition enables specialized processing via dual parallel pathways: the low-frequency branch employs SSMs for global context modeling, while the high-frequency branch utilizes CNNs to refine local structural details, effectively addressing diverse haze scenarios. Notably, the Laplace transformation facilitates information-preserving downsampling of low-frequency components in accordance with the Nyquist theory, thereby significantly improving computational efficiency. Extensive evaluations across multiple benchmarks demonstrate that our method outperforms state-of-the-art approaches in both restoration quality and efficiency. The source code and pretrained models are available at https://github.com/yz-wang/Laplace-Mamba.

CVApr 7, 2025

DA2Diff: Exploring Degradation-aware Adaptive Diffusion Priors for All-in-One Weather Restoration

Jiamei Xiong, Xuefeng Yan, Yongzhen Wang et al.

Image restoration under adverse weather conditions is a critical task for many vision-based applications. Recent all-in-one frameworks that handle multiple weather degradations within a unified model have shown potential. However, the diversity of degradation patterns across different weather conditions, as well as the complex and varied nature of real-world degradations, pose significant challenges for multiple weather removal. To address these challenges, we propose an innovative diffusion paradigm with degradation-aware adaptive priors for all-in-one weather restoration, termed DA2Diff. It is a new exploration that applies CLIP to perceive degradation-aware properties for better multi-weather restoration. Specifically, we deploy a set of learnable prompts to capture degradation-aware representations by the prompt-image similarity constraints in the CLIP space. By aligning the snowy/hazy/rainy images with snow/haze/rain prompts, each prompt contributes to different weather degradation characteristics. The learned prompts are then integrated into the diffusion model via the designed weather specific prompt guidance module, making it possible to restore multiple weather types. To further improve the adaptiveness to complex weather degradations, we propose a dynamic expert selection modulator that employs a dynamic weather-aware router to flexibly dispatch varying numbers of restoration experts for each weather-distorted image, allowing the diffusion model to restore diverse degradations adaptively. Experimental results substantiate the favorable performance of DA2Diff over state-of-the-arts in quantitative and qualitative evaluation. Source code will be available after acceptance.

CVJun 9, 2025

M2Restore: Mixture-of-Experts-based Mamba-CNN Fusion Framework for All-in-One Image Restoration

Yongzhen Wang, Yongjun Li, Zhuoran Zheng et al.

Natural images are often degraded by complex, composite degradations such as rain, snow, and haze, which adversely impact downstream vision applications. While existing image restoration efforts have achieved notable success, they are still hindered by two critical challenges: limited generalization across dynamically varying degradation scenarios and a suboptimal balance between preserving local details and modeling global dependencies. To overcome these challenges, we propose M2Restore, a novel Mixture-of-Experts (MoE)-based Mamba-CNN fusion framework for efficient and robust all-in-one image restoration. M2Restore introduces three key contributions: First, to boost the model's generalization across diverse degradation conditions, we exploit a CLIP-guided MoE gating mechanism that fuses task-conditioned prompts with CLIP-derived semantic priors. This mechanism is further refined via cross-modal feature calibration, which enables precise expert selection for various degradation types. Second, to jointly capture global contextual dependencies and fine-grained local details, we design a dual-stream architecture that integrates the localized representational strength of CNNs with the long-range modeling efficiency of Mamba. This integration enables collaborative optimization of global semantic relationships and local structural fidelity, preserving global coherence while enhancing detail restoration. Third, we introduce an edge-aware dynamic gating mechanism that adaptively balances global modeling and local enhancement by reallocating computational attention to degradation-sensitive regions. This targeted focus leads to more efficient and precise restoration. Extensive experiments across multiple image restoration benchmarks validate the superiority of M2Restore in both visual quality and quantitative performance.

AIJan 17, 2024

Knowledge Pyramid: A Novel Hierarchical Reasoning Structure for Generalized Knowledge Augmentation and Inference

Qinghua Huang, Yongzhen Wang

Knowledge graph (KG) based reasoning has been regarded as an effective means for the analysis of semantic networks and is of great usefulness in areas of information retrieval, recommendation, decision-making, and man-machine interaction. It is widely used in recommendation, decision-making, question-answering, search, and other fields. However, previous studies mainly used low-level knowledge in the KG for reasoning, which may result in insufficient generalization and poor robustness of reasoning. To this end, this paper proposes a new inference approach using a novel knowledge augmentation strategy to improve the generalization capability of KG. This framework extracts high-level pyramidal knowledge from low-level knowledge and applies it to reasoning in a multi-level hierarchical KG, called knowledge pyramid in this paper. We tested some medical data sets using the proposed approach, and the experimental results show that the proposed knowledge pyramid has improved the knowledge inference performance with better generalization. Especially, when there are fewer training samples, the inference accuracy can be significantly improved.

CYApr 29, 2021

Leveraging Online Shopping Behaviors as a Proxy for Personal Lifestyle Choices: New Insights into Chronic Disease Prevention Literacy

Yongzhen Wang, Xiaozhong Liu, Katy Börner et al.

Objective: Ubiquitous internet access is reshaping the way we live, but it is accompanied by unprecedented challenges in preventing chronic diseases that are usually planted by long exposure to unhealthy lifestyles. This paper proposes leveraging online shopping behaviors as a proxy for personal lifestyle choices to improve chronic disease prevention literacy, targeted for times when e-commerce user experience has been assimilated into most people's everyday lives. Methods: Longitudinal query logs and purchase records from 15 million online shoppers were accessed, constructing a broad spectrum of lifestyle features covering various product categories and buyer personas. Using the lifestyle-related information preceding online shoppers' first purchases of specific prescription drugs, we could determine associations between their past lifestyle choices and whether they suffered from a particular chronic disease. Results: Novel lifestyle risk factors were discovered in two exemplars--depression and type 2 diabetes, most of which showed reasonable consistency with existing healthcare knowledge. Further, such empirical findings could be adopted to locate online shoppers at higher risk of these chronic diseases with decent accuracy [i.e., (area under the receiver operating characteristic curve) AUC=0.68 for depression and AUC=0.70 for type 2 diabetes], closely matching the performance of screening surveys benchmarked against medical diagnosis. Conclusions: Mining online shopping behaviors can point medical experts to a series of lifestyle issues associated with chronic diseases that are less explored to date. Hopefully, unobtrusive chronic disease surveillance via e-commerce sites can grant consenting individuals a privilege to be connected more readily with the medical profession and sophistication.

CLOct 7, 2020

Knowledge-enriched, Type-constrained and Grammar-guided Question Generation over Knowledge Bases

Sheng Bi, Xiya Cheng, Yuan-Fang Li et al.

Question generation over knowledge bases (KBQG) aims at generating natural-language questions about a subgraph, i.e. a set of (connected) triples. Two main challenges still face the current crop of encoder-decoder-based methods, especially on small subgraphs: (1) low diversity and poor fluency due to the limited information contained in the subgraphs, and (2) semantic drift due to the decoder's oblivion of the semantics of the answer entity. We propose an innovative knowledge-enriched, type-constrained and grammar-guided KBQG model, named KTG, to addresses the above challenges. In our model, the encoder is equipped with auxiliary information from the KB, and the decoder is constrained with word types during QG. Specifically, entity domain and description, as well as relation hierarchy information are considered to construct question contexts, while a conditional copy mechanism is incorporated to modulate question semantics according to current word types. Besides, a novel reward function featuring grammatical similarity is designed to improve both generative richness and syntactic correctness via reinforcement learning. Extensive experiments show that our proposed model outperforms existing methods by a significant margin on two widely-used benchmark datasets SimpleQuestion and PathQuestion.

CLOct 7, 2020

Knowledge-aware Method for Confusing Charge Prediction

Xiya Cheng, Sheng Bi, Guilin Qi et al.

Automatic charge prediction task aims to determine the final charges based on fact descriptions of criminal cases, which is a vital application of legal assistant systems. Conventional works usually depend on fact descriptions to predict charges while ignoring the legal schematic knowledge, which makes it difficult to distinguish confusing charges. In this paper, we propose a knowledge-attentive neural network model, which introduces legal schematic knowledge about charges and exploit the knowledge hierarchical representation as the discriminative features to differentiate confusing charges. Our model takes the textual fact description as the input and learns fact representation through a graph convolutional network. A legal schematic knowledge transformer is utilized to generate crucial knowledge representations oriented to the legal schematic knowledge at both the schema and charge levels. We apply a knowledge matching network for effectively incorporating charge information into the fact to learn knowledge-aware fact representation. Finally, we use the knowledge-aware fact representation for charge prediction. We create two real-world datasets and experimental results show that our proposed model can outperform other state-of-the-art baselines on accuracy and F1 score, especially on dealing with confusing charges.

IRJun 24, 2020

Community-Based Data Integration of Course and Job Data in Support of Personalized Career-Education Recommendations

Guoqing Zhu, Naga Anjaneyulu Kopalle, Yongzhen Wang et al.

How does your education impact your professional career? Ideally, the courses you take help you identify, get hired for, and perform the job you always wanted. However, not all courses provide skills that transfer to existing and future jobs; skill terms used in course descriptions might be different from those listed in job advertisements; and there might exist a considerable skill gap between what is taught in courses and what is needed for a job. In this study, we propose a novel method to integrate extensive course description and job advertisement data by leveraging heterogeneous data integration and community detection. The innovative heterogeneous graph approach along with identified skill communities enables cross-domain information recommendation, e.g., given an educational profile, job recommendations can be provided together with suggestions on education opportunities for re- and upskilling in support of lifelong learning.

CLJan 28, 2019

Yongzhen Wang, Xiaozhong Liu, Zheng Gao

Conventional solutions to automatic related work summarization rely heavily on human-engineered features. In this paper, we develop a neural data-driven summarizer by leveraging the seq2seq paradigm, in which a joint context-driven attention mechanism is proposed to measure the contextual relevance within full texts and a heterogeneous bibliography graph simultaneously. Our motivation is to maintain the topic coherency between a related work section and its target document, where both the textual and graphic contexts play a big role in characterizing the relationship among scientific publications accurately. Experimental results on a large dataset show that our approach achieves a considerable improvement over a typical seq2seq summarizer and five classical summarization baselines.