Deming Zhai

CV
h-index28
20papers
411citations
Novelty53%
AI Score51

20 Papers

CVApr 6, 2023Code
Super-Resolving Face Image by Facial Parsing Information

Chenyang Wang, Junjun Jiang, Zhiwei Zhong et al.

Face super-resolution is a technology that transforms a low-resolution face image into the corresponding high-resolution one. In this paper, we build a novel parsing map guided face super-resolution network which extracts the face prior (i.e., parsing map) directly from low-resolution face image for the following utilization. To exploit the extracted prior fully, a parsing map attention fusion block is carefully designed, which can not only effectively explore the information of parsing map, but also combines powerful attention mechanism. Moreover, in light of that high-resolution features contain more precise spatial information while low-resolution features provide strong contextual information, we hope to maintain and utilize these complementary information. To achieve this goal, we develop a multi-scale refine block to maintain spatial and contextual information and take advantage of multi-scale features to refine the feature representations. Experimental results demonstrate that our method outperforms the state-of-the-arts in terms of quantitative metrics and visual quality. The source codes will be available at https://github.com/wcy-cs/FishFSRNet.

CVMar 8, 2022
Shadows can be Dangerous: Stealthy and Effective Physical-world Adversarial Attack by Natural Phenomenon

Yiqi Zhong, Xianming Liu, Deming Zhai et al.

Estimating the risk level of adversarial examples is essential for safely deploying machine learning models in the real world. One popular approach for physical-world attacks is to adopt the "sticker-pasting" strategy, which however suffers from some limitations, including difficulties in access to the target or printing by valid colors. A new type of non-invasive attacks emerged recently, which attempt to cast perturbation onto the target by optics based tools, such as laser beam and projector. However, the added optical patterns are artificial but not natural. Thus, they are still conspicuous and attention-grabbed, and can be easily noticed by humans. In this paper, we study a new type of optical adversarial examples, in which the perturbations are generated by a very common natural phenomenon, shadow, to achieve naturalistic and stealthy physical-world adversarial attack under the black-box setting. We extensively evaluate the effectiveness of this new attack on both simulated and real-world environments. Experimental results on traffic sign recognition demonstrate that our algorithm can generate adversarial examples effectively, reaching 98.23% and 90.47% success rates on LISA and GTSRB test sets respectively, while continuously misleading a moving camera over 95% of the time in real-world scenarios. We also offer discussions about the limitations and the defense mechanism of this attack.

CVJun 23, 2022
Learning Towards the Largest Margins

Xiong Zhou, Xianming Liu, Deming Zhai et al.

One of the main challenges for feature representation in deep learning-based classification is the design of appropriate loss functions that exhibit strong discriminative power. The classical softmax loss does not explicitly encourage discriminative learning of features. A popular direction of research is to incorporate margins in well-established losses in order to enforce extra intra-class compactness and inter-class separability, which, however, were developed through heuristic means, as opposed to rigorous mathematical principles. In this work, we attempt to address this limitation by formulating the principled optimization objective as learning towards the largest margins. Specifically, we firstly define the class margin as the measure of inter-class separability, and the sample margin as the measure of intra-class compactness. Accordingly, to encourage discriminative representation of features, the loss function should promote the largest possible margins for both classes and samples. Furthermore, we derive a generalized margin softmax loss to draw general conclusions for the existing margin-based losses. Not only does this principled framework offer new perspectives to understand and interpret existing margin-based losses, but it also provides new insights that can guide the design of new tools, including sample margin regularization and largest margin softmax loss for the class-balanced case, and zero-centroid regularization for the class-imbalanced case. Experimental results demonstrate the effectiveness of our strategy on a variety of tasks, including visual classification, imbalanced classification, person re-identification, and face verification.

LGJun 23, 2022
Prototype-Anchored Learning for Learning with Imperfect Annotations

Xiong Zhou, Xianming Liu, Deming Zhai et al.

The success of deep neural networks greatly relies on the availability of large amounts of high-quality annotated data, which however are difficult or expensive to obtain. The resulting labels may be class imbalanced, noisy or human biased. It is challenging to learn unbiased classification models from imperfectly annotated datasets, on which we usually suffer from overfitting or underfitting. In this work, we thoroughly investigate the popular softmax loss and margin-based loss, and offer a feasible approach to tighten the generalization error bound by maximizing the minimal sample margin. We further derive the optimality condition for this purpose, which indicates how the class prototypes should be anchored. Motivated by theoretical analysis, we propose a simple yet effective method, namely prototype-anchored learning (PAL), which can be easily incorporated into various learning-based classification schemes to handle imperfect annotation. We verify the effectiveness of PAL on class-imbalanced learning and noise-tolerant learning by extensive experiments on synthetic and real-world datasets.

LGNov 15, 2025
Variation-Bounded Loss for Noise-Tolerant Learning

Jialiang Wang, Xiong Zhou, Xianming Liu et al.

Mitigating the negative impact of noisy labels has been aperennial issue in supervised learning. Robust loss functions have emerged as a prevalent solution to this problem. In this work, we introduce the Variation Ratio as a novel property related to the robustness of loss functions, and propose a new family of robust loss functions, termed Variation-Bounded Loss (VBL), which is characterized by a bounded variation ratio. We provide theoretical analyses of the variation ratio, proving that a smaller variation ratio would lead to better robustness. Furthermore, we reveal that the variation ratio provides a feasible method to relax the symmetric condition and offers a more concise path to achieve the asymmetric condition. Based on the variation ratio, we reformulate several commonly used loss functions into a variation-bounded form for practical applications. Positive experiments on various datasets exhibit the effectiveness and flexibility of our approach.

LGAug 4, 2025Code
$ε$-Softmax: Approximating One-Hot Vectors for Mitigating Label Noise

Jialiang Wang, Xiong Zhou, Deming Zhai et al.

Noisy labels pose a common challenge for training accurate deep neural networks. To mitigate label noise, prior studies have proposed various robust loss functions to achieve noise tolerance in the presence of label noise, particularly symmetric losses. However, they usually suffer from the underfitting issue due to the overly strict symmetric condition. In this work, we propose a simple yet effective approach for relaxing the symmetric condition, namely $ε$-softmax, which simply modifies the outputs of the softmax layer to approximate one-hot vectors with a controllable error $ε$. Essentially, $ε$-softmax not only acts as an alternative for the softmax layer, but also implicitly plays the crucial role in modifying the loss function. We prove theoretically that $ε$-softmax can achieve noise-tolerant learning with controllable excess risk bound for almost any loss function. Recognizing that $ε$-softmax-enhanced losses may slightly reduce fitting ability on clean datasets, we further incorporate them with one symmetric loss, thereby achieving a better trade-off between robustness and effective learning. Extensive experiments demonstrate the superiority of our method in mitigating synthetic and real-world label noise. The code is available at https://github.com/cswjl/eps-softmax.

LGJul 23, 2025Code
Joint Asymmetric Loss for Learning with Noisy Labels

Jialiang Wang, Xianming Liu, Xiong Zhou et al.

Learning with noisy labels is a crucial task for training accurate deep neural networks. To mitigate label noise, prior studies have proposed various robust loss functions, particularly symmetric losses. Nevertheless, symmetric losses usually suffer from the underfitting issue due to the overly strict constraint. To address this problem, the Active Passive Loss (APL) jointly optimizes an active and a passive loss to mutually enhance the overall fitting ability. Within APL, symmetric losses have been successfully extended, yielding advanced robust loss functions. Despite these advancements, emerging theoretical analyses indicate that asymmetric losses, a new class of robust loss functions, possess superior properties compared to symmetric losses. However, existing asymmetric losses are not compatible with advanced optimization frameworks such as APL, limiting their potential and applicability. Motivated by this theoretical gap and the prospect of asymmetric losses, we extend the asymmetric loss to the more complex passive loss scenario and propose the Asymetric Mean Square Error (AMSE), a novel asymmetric loss. We rigorously establish the necessary and sufficient condition under which AMSE satisfies the asymmetric condition. By substituting the traditional symmetric passive loss in APL with our proposed AMSE, we introduce a novel robust loss framework termed Joint Asymmetric Loss (JAL). Extensive experiments demonstrate the effectiveness of our method in mitigating label noise. Code available at: https://github.com/cswjl/joint-asymmetric-loss

LGJul 31, 2021Code
Learning with Noisy Labels via Sparse Regularization

Xiong Zhou, Xianming Liu, Chenyang Wang et al.

Learning with noisy labels is an important and challenging task for training accurate deep neural networks. Some commonly-used loss functions, such as Cross Entropy (CE), suffer from severe overfitting to noisy labels. Robust loss functions that satisfy the symmetric condition were tailored to remedy this problem, which however encounter the underfitting effect. In this paper, we theoretically prove that \textbf{any loss can be made robust to noisy labels} by restricting the network output to the set of permutations over a fixed vector. When the fixed vector is one-hot, we only need to constrain the output to be one-hot, which however produces zero gradients almost everywhere and thus makes gradient-based optimization difficult. In this work, we introduce the sparse regularization strategy to approximate the one-hot constraint, which is composed of network output sharpening operation that enforces the output distribution of a network to be sharp and the $\ell_p$-norm ($p\le 1$) regularization that promotes the network output to be sparse. This simple approach guarantees the robustness of arbitrary loss functions while not hindering the fitting ability. Experimental results demonstrate that our method can significantly improve the performance of commonly-used loss functions in the presence of noisy labels and class imbalance, and outperform the state-of-the-art methods. The code is available at https://github.com/hitcszx/lnl_sr.

IVOct 23, 2024
Learning Lossless Compression for High Bit-Depth Volumetric Medical Image

Kai Wang, Yuanchao Bai, Daxin Li et al.

Recent advances in learning-based methods have markedly enhanced the capabilities of image compression. However, these methods struggle with high bit-depth volumetric medical images, facing issues such as degraded performance, increased memory demand, and reduced processing speed. To address these challenges, this paper presents the Bit-Division based Lossless Volumetric Image Compression (BD-LVIC) framework, which is tailored for high bit-depth medical volume compression. The BD-LVIC framework skillfully divides the high bit-depth volume into two lower bit-depth segments: the Most Significant Bit-Volume (MSBV) and the Least Significant Bit-Volume (LSBV). The MSBV concentrates on the most significant bits of the volumetric medical image, capturing vital structural details in a compact manner. This reduction in complexity greatly improves compression efficiency using traditional codecs. Conversely, the LSBV deals with the least significant bits, which encapsulate intricate texture details. To compress this detailed information effectively, we introduce an effective learning-based compression model equipped with a Transformer-Based Feature Alignment Module, which exploits both intra-slice and inter-slice redundancies to accurately align features. Subsequently, a Parallel Autoregressive Coding Module merges these features to precisely estimate the probability distribution of the least significant bit-planes. Our extensive testing demonstrates that the BD-LVIC framework not only sets new performance benchmarks across various datasets but also maintains a competitive coding speed, highlighting its significant potential and practical utility in the realm of volumetric medical image compression.

CVOct 15, 2024
MCGS: Multiview Consistency Enhancement for Sparse-View 3D Gaussian Radiance Fields

Yuru Xiao, Deming Zhai, Wenbo Zhao et al.

Radiance fields represented by 3D Gaussians excel at synthesizing novel views, offering both high training efficiency and fast rendering. However, with sparse input views, the lack of multi-view consistency constraints results in poorly initialized Gaussians and unreliable heuristics for optimization, leading to suboptimal performance. Existing methods often incorporate depth priors from dense estimation networks but overlook the inherent multi-view consistency in input images. Additionally, they rely on dense initialization, which limits the efficiency of scene representation. To overcome these challenges, we propose a view synthesis framework based on 3D Gaussian Splatting, named MCGS, enabling photorealistic scene reconstruction from sparse views. The key innovations of MCGS in enhancing multi-view consistency are as follows: i) We leverage matching priors from a sparse matcher to initialize Gaussians primarily on textured regions, while low-texture areas are populated with randomly distributed Gaussians. This yields a compact yet sufficient set of initial Gaussians. ii) We propose a multi-view consistency-guided progressive pruning strategy to dynamically eliminate inconsistent Gaussians. This approach confines their optimization to a consistency-constrained space, which ensures robust and coherent scene reconstruction. These strategies enhance robustness to sparse views, accelerate rendering, and reduce memory consumption, making MCGS a practical framework for 3D Gaussian Splatting.

LGDec 13, 2023
On the Dynamics Under the Unhinged Loss and Beyond

Xiong Zhou, Xianming Liu, Hanzhang Wang et al.

Recent works have studied implicit biases in deep learning, especially the behavior of last-layer features and classifier weights. However, they usually need to simplify the intermediate dynamics under gradient flow or gradient descent due to the intractability of loss functions and model architectures. In this paper, we introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze the closed-form dynamics while requiring as few simplifications or assumptions as possible. The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization. Based on the layer-peeled model that views last-layer features as free optimization variables, we conduct a thorough analysis in the unconstrained, regularized, and spherical constrained cases, as well as the case where the neural tangent kernel remains invariant. To bridge the performance of the unhinged loss to that of Cross-Entropy (CE), we investigate the scenario of fixing classifier weights with a specific structure, (e.g., a simplex equiangular tight frame). Our analysis shows that these dynamics converge exponentially fast to a solution depending on the initialization of features and classifier weights. These theoretical results not only offer valuable insights, including explicit feature regularization and rescaled learning rates for enhancing practical training with the unhinged loss, but also extend their applicability to other loss functions. Finally, we empirically demonstrate these theoretical results and insights through extensive experiments.

CVApr 1, 2024
SGCNeRF: Few-Shot Neural Rendering via Sparse Geometric Consistency Guidance

Yuru Xiao, Xianming Liu, Deming Zhai et al.

Neural Radiance Field (NeRF) technology has made significant strides in creating novel viewpoints. However, its effectiveness is hampered when working with sparsely available views, often leading to performance dips due to overfitting. FreeNeRF attempts to overcome this limitation by integrating implicit geometry regularization, which incrementally improves both geometry and textures. Nonetheless, an initial low positional encoding bandwidth results in the exclusion of high-frequency elements. The quest for a holistic approach that simultaneously addresses overfitting and the preservation of high-frequency details remains ongoing. This study presents a novel feature-matching-based sparse geometry regularization module, enhanced by a spatially consistent geometry filtering mechanism and a frequency-guided geometric regularization strategy. This module excels at accurately identifying high-frequency keypoints, effectively preserving fine structural details. Through progressive refinement of geometry and textures across NeRF iterations, we unveil an effective few-shot neural rendering architecture, designated as SGCNeRF, for enhanced novel view synthesis. Our experiments demonstrate that SGCNeRF not only achieves superior geometry-consistent outcomes but also surpasses FreeNeRF, with improvements of 0.7 dB in PSNR on LLFF and DTU.

CVAug 4, 2025
VDEGaussian: Video Diffusion Enhanced 4D Gaussian Splatting for Dynamic Urban Scenes Modeling

Yuru Xiao, Zihan Lin, Chao Lu et al.

Dynamic urban scene modeling is a rapidly evolving area with broad applications. While current approaches leveraging neural radiance fields or Gaussian Splatting have achieved fine-grained reconstruction and high-fidelity novel view synthesis, they still face significant limitations. These often stem from a dependence on pre-calibrated object tracks or difficulties in accurately modeling fast-moving objects from undersampled capture, particularly due to challenges in handling temporal discontinuities. To overcome these issues, we propose a novel video diffusion-enhanced 4D Gaussian Splatting framework. Our key insight is to distill robust, temporally consistent priors from a test-time adapted video diffusion model. To ensure precise pose alignment and effective integration of this denoised content, we introduce two core innovations: a joint timestamp optimization strategy that refines interpolated frame poses, and an uncertainty distillation method that adaptively extracts target content while preserving well-reconstructed regions. Extensive experiments demonstrate that our method significantly enhances dynamic modeling, especially for fast-moving objects, achieving an approximate PSNR gain of 2 dB for novel view synthesis over baseline approaches.

CVJun 12, 2024
Spatial Annealing for Efficient Few-shot Neural Rendering

Yuru Xiao, Deming Zhai, Wenbo Zhao et al.

Neural Radiance Fields (NeRF) with hybrid representations have shown impressive capabilities for novel view synthesis, delivering high efficiency. Nonetheless, their performance significantly drops with sparse input views. Various regularization strategies have been devised to address these challenges. However, these strategies either require additional rendering costs or involve complex pipeline designs, leading to a loss of training efficiency. Although FreeNeRF has introduced an efficient frequency annealing strategy, its operation on frequency positional encoding is incompatible with the efficient hybrid representations. In this paper, we introduce an accurate and efficient few-shot neural rendering method named \textbf{S}patial \textbf{A}nnealing regularized \textbf{NeRF} (\textbf{SANeRF}), which adopts the pre-filtering design of a hybrid representation. We initially establish the analytical formulation of the frequency band limit for a hybrid architecture by deducing its filtering process. Based on this analysis, we propose a universal form of frequency annealing in the spatial domain, which can be implemented by modulating the sampling kernel to exponentially shrink from an initial one with a narrow grid tangent kernel spectrum. This methodology is crucial for stabilizing the early stages of the training phase and significantly contributes to enhancing the subsequent process of detail refinement. Our extensive experiments reveal that, by adding merely one line of code, SANeRF delivers superior rendering quality and much faster reconstruction speed compared to current few-shot neural rendering methods. Notably, SANeRF outperforms FreeNeRF on the Blender dataset, achieving 700$\times$ faster reconstruction speed.

CVMay 10, 2024
Mesh Denoising Transformer

Wenbo Zhao, Xianming Liu, Deming Zhai et al.

Mesh denoising, aimed at removing noise from input meshes while preserving their feature structures, is a practical yet challenging task. Despite the remarkable progress in learning-based mesh denoising methodologies in recent years, their network designs often encounter two principal drawbacks: a dependence on single-modal geometric representations, which fall short in capturing the multifaceted attributes of meshes, and a lack of effective global feature aggregation, hindering their ability to fully understand the mesh's comprehensive structure. To tackle these issues, we propose SurfaceFormer, a pioneering Transformer-based mesh denoising framework. Our first contribution is the development of a new representation known as Local Surface Descriptor, which is crafted by establishing polar systems on each mesh face, followed by sampling points from adjacent surfaces using geodesics. The normals of these points are organized into 2D patches, mimicking images to capture local geometric intricacies, whereas the poles and vertex coordinates are consolidated into a point cloud to embody spatial information. This advancement surmounts the hurdles posed by the irregular and non-Euclidean characteristics of mesh data, facilitating a smooth integration with Transformer architecture. Next, we propose a dual-stream structure consisting of a Geometric Encoder branch and a Spatial Encoder branch, which jointly encode local geometry details and spatial information to fully explore multimodal information for mesh denoising. A subsequent Denoising Transformer module receives the multimodal information and achieves efficient global feature aggregation through self-attention operators. Our experimental evaluations demonstrate that this novel approach outperforms existing state-of-the-art methods in both objective and subjective assessments, marking a significant leap forward in mesh denoising.

CRMay 28, 2023
Backdoor Attacks Against Incremental Learners: An Empirical Evaluation Study

Yiqi Zhong, Xianming Liu, Deming Zhai et al.

Large amounts of incremental learning algorithms have been proposed to alleviate the catastrophic forgetting issue arises while dealing with sequential data on a time series. However, the adversarial robustness of incremental learners has not been widely verified, leaving potential security risks. Specifically, for poisoning-based backdoor attacks, we argue that the nature of streaming data in IL provides great convenience to the adversary by creating the possibility of distributed and cross-task attacks -- an adversary can affect \textbf{any unknown} previous or subsequent task by data poisoning \textbf{at any time or time series} with extremely small amount of backdoor samples injected (e.g., $0.1\%$ based on our observations). To attract the attention of the research community, in this paper, we empirically reveal the high vulnerability of 11 typical incremental learners against poisoning-based backdoor attack on 3 learning scenarios, especially the cross-task generalization effect of backdoor knowledge, while the poison ratios range from $5\%$ to as low as $0.1\%$. Finally, the defense mechanism based on activation clustering is found to be effective in detecting our trigger pattern to mitigate potential security risks.

CVOct 15, 2020
Fully Unsupervised Person Re-identification viaSelective Contrastive Learning

Bo Pang, Deming Zhai, Junjun Jiang et al.

Person re-identification (ReID) aims at searching the same identity person among images captured by various cameras. Unsupervised person ReID attracts a lot of attention recently, due to it works without intensive manual annotation and thus shows great potential of adapting to new conditions. Representation learning plays a critical role in unsupervised person ReID. In this work, we propose a novel selective contrastive learning framework for unsupervised feature learning. Specifically, different from traditional contrastive learning strategies, we propose to use multiple positives and adaptively sampled negatives for defining the contrastive loss, enabling to learn a feature embedding model with stronger identity discriminative representation. Moreover, we propose to jointly leverage global and local features to construct three dynamic dictionaries, among which the global and local memory banks are used for pairwise similarity computation and the mixture memory bank are used for contrastive loss definition. Experimental results demonstrate the superiority of our method in unsupervised person ReID compared with the state-of-the-arts.

CVJun 9, 2020
Single Image Deraining via Scale-space Invariant Attention Neural Network

Bo Pang, Deming Zhai, Junjun Jiang et al.

Image enhancement from degradation of rainy artifacts plays a critical role in outdoor visual computing systems. In this paper, we tackle the notion of scale that deals with visual changes in appearance of rain steaks with respect to the camera. Specifically, we revisit multi-scale representation by scale-space theory, and propose to represent the multi-scale correlation in convolutional feature domain, which is more compact and robust than that in pixel domain. Moreover, to improve the modeling ability of the network, we do not treat the extracted multi-scale features equally, but design a novel scale-space invariant attention mechanism to help the network focus on parts of the features. In this way, we summarize the most activated presence of feature maps as the salient features. Extensive experiments results on synthetic and real rainy scenes demonstrate the superior performance of our scheme over the state-of-the-arts.

CVMar 17, 2020
Rectified Meta-Learning from Noisy Labels for Robust Image-based Plant Disease Diagnosis

Ruifeng Shi, Deming Zhai, Xianming Liu et al.

Plant diseases serve as one of main threats to food security and crop production. It is thus valuable to exploit recent advances of artificial intelligence to assist plant disease diagnosis. One popular approach is to transform this problem as a leaf image classification task, which can be then addressed by the powerful convolutional neural networks (CNNs). However, the performance of CNN-based classification approach depends on a large amount of high-quality manually labeled training data, which are inevitably introduced noise on labels in practice, leading to model overfitting and performance degradation. To overcome this problem, we propose a novel framework that incorporates rectified meta-learning module into common CNN paradigm to train a noise-robust deep network without using extra supervision information. The proposed method enjoys the following merits: i) A rectified meta-learning is designed to pay more attention to unbiased samples, leading to accelerated convergence and improved classification accuracy. ii) Our method is free on assumption of label noise distribution, which works well on various kinds of noise. iii) Our method serves as a plug-and-play module, which can be embedded into any deep models optimized by gradient descent based method. Extensive experiments are conducted to demonstrate the superior performance of our algorithm over the state-of-the-arts.

IVMar 4, 2020
ADRN: Attention-based Deep Residual Network for Hyperspectral Image Denoising

Yongsen Zhao, Deming Zhai, Junjun Jiang et al.

Hyperspectral image (HSI) denoising is of crucial importance for many subsequent applications, such as HSI classification and interpretation. In this paper, we propose an attention-based deep residual network to directly learn a mapping from noisy HSI to the clean one. To jointly utilize the spatial-spectral information, the current band and its $K$ adjacent bands are simultaneously exploited as the input. Then, we adopt convolution layer with different filter sizes to fuse the multi-scale feature, and use shortcut connection to incorporate the multi-level information for better noise removal. In addition, the channel attention mechanism is employed to make the network concentrate on the most relevant auxiliary information and features that are beneficial to the denoising process best. To ease the training procedure, we reconstruct the output through a residual mode rather than a straightforward prediction. Experimental results demonstrate that our proposed ADRN scheme outperforms the state-of-the-art methods both in quantitative and visual evaluations.