28.7CVMay 29Code
Latent Geometric Chords for Query-Efficient Decision-Based Adversarial AttacksEi Hmue Khine, Yao Li, Jiebao Sun et al.
While decision-based black-box adversarial attacks present a severe security threat, current methodologies suffer from fundamental limitations. Pixel-wise attacks frequently introduce unnatural, high-frequency visual artifacts, while latent-space frameworks are confined by the limited search space of low-dimensional manifolds and inherent reconstruction flaws. To resolve these limitations, we propose Latent Geometric Chords (LGC) for Query-Efficient Decision-Based Adversarial Attacks alongside a variant, LGC-H. At its core, LGC navigates decision boundaries by executing a curvature-aware geometric search within a compressed semantic manifold. To guarantee high visual fidelity and circumvent dimensionality bottlenecks, we introduce a Residual-based Adversarial Generation (RAG) mechanism. RAG isolates semantic perturbations as geometric chords and superimposes them directly onto the original source image. RAG substantially resolves baseline reconstruction flaws and effectively doubles the permissible search space dimensions. Experimental results demonstrate that LGC achieves robust cross-dataset transferability and substantially outperforms state-of-the-art baselines. Notably, our method, LGC, minimizes perturbation magnitudes while achieving state-of-the-art visual fidelity--with a Structural Similarity Index Measure (SSIM) exceeding 0.99 and a Learned Perceptual Image Patch Similarity (LPIPS) below 0.01 at 5000 queries--and sustaining high attack success rates under stringent perceptual constraints, successfully compromising adversarially trained robust models. The source code is available at: https://github.com/eihmuekhine/Latent-Geometric-Chords.
CVJun 28, 2023
Evaluating Similitude and Robustness of Deep Image Denoising Models via Adversarial AttackJie Ning, Jiebao Sun, Yao Li et al.
Deep neural networks (DNNs) have shown superior performance comparing to traditional image denoising algorithms. However, DNNs are inevitably vulnerable while facing adversarial attacks. In this paper, we propose an adversarial attack method named denoising-PGD which can successfully attack all the current deep denoising models while keep the noise distribution almost unchanged. We surprisingly find that the current mainstream non-blind denoising models (DnCNN, FFDNet, ECNDNet, BRDNet), blind denoising models (DnCNN-B, Noise2Noise, RDDCNN-B, FAN), plug-and-play (DPIR, CurvPnP) and unfolding denoising models (DeamNet) almost share the same adversarial sample set on both grayscale and color images, respectively. Shared adversarial sample set indicates that all these models are similar in term of local behaviors at the neighborhood of all the test samples. Thus, we further propose an indicator to measure the local similarity of models, called robustness similitude. Non-blind denoising models are found to have high robustness similitude across each other, while hybrid-driven models are also found to have high robustness similitude with pure data-driven non-blind denoising models. According to our robustness assessment, data-driven non-blind denoising models are the most robust. We use adversarial training to complement the vulnerability to adversarial attacks. Moreover, the model-driven image denoising BM3D shows resistance on adversarial attacks.
CVAug 15, 2023
A Review of Adversarial Attacks in Computer VisionYutong Zhang, Yao Li, Yin Li et al.
Deep neural networks have been widely used in various downstream tasks, especially those safety-critical scenario such as autonomous driving, but deep networks are often threatened by adversarial samples. Such adversarial attacks can be invisible to human eyes, but can lead to DNN misclassification, and often exhibits transferability between deep learning and machine learning models and real-world achievability. Adversarial attacks can be divided into white-box attacks, for which the attacker knows the parameters and gradient of the model, and black-box attacks, for the latter, the attacker can only obtain the input and output of the model. In terms of the attacker's purpose, it can be divided into targeted attacks and non-targeted attacks, which means that the attacker wants the model to misclassify the original sample into the specified class, which is more practical, while the non-targeted attack just needs to make the model misclassify the sample. The black box setting is a scenario we will encounter in practice.
CVJun 29, 2023
Boosting the Generalization Ability for Hyperspectral Image Classification using Spectral-spatial Axial Aggregation TransformerEnzhe Zhao, Zhichang Guo, Shengzhu Shi et al.
In the hyperspectral image classification (HSIC) task, the most commonly used model validation paradigm is partitioning the training-test dataset through pixel-wise random sampling. By training on a small amount of data, the deep learning model can achieve almost perfect accuracy. However, in our experiments, we found that the high accuracy was reached because the training and test datasets share a lot of information. On non-overlapping dataset partitions, well-performing models suffer significant performance degradation. To this end, we propose a spectral-spatial axial aggregation transformer model, namely SaaFormer, that preserves generalization across dataset partitions. SaaFormer applies a multi-level spectral extraction structure to segment the spectrum into multiple spectrum clips, such that the wavelength continuity of the spectrum across the channel are preserved. For each spectrum clip, the axial aggregation attention mechanism, which integrates spatial features along multiple spectral axes is applied to mine the spectral characteristic. The multi-level spectral extraction and the axial aggregation attention emphasize spectral characteristic to improve the model generalization. The experimental results on five publicly available datasets demonstrate that our model exhibits comparable performance on the random partition, while significantly outperforming other methods on non-overlapping partitions. Moreover, SaaFormer shows excellent performance on background classification.
CVNov 19, 2022
Real-World Image Super Resolution via Unsupervised Bi-directional Cycle Domain Transfer Learning based Generative Adversarial NetworkXiang Wang, Yimin Yang, Zhichang Guo et al.
Deep Convolutional Neural Networks (DCNNs) have exhibited impressive performance on image super-resolution tasks. However, these deep learning-based super-resolution methods perform poorly in real-world super-resolution tasks, where the paired high-resolution and low-resolution images are unavailable and the low-resolution images are degraded by complicated and unknown kernels. To break these limitations, we propose the Unsupervised Bi-directional Cycle Domain Transfer Learning-based Generative Adversarial Network (UBCDTL-GAN), which consists of an Unsupervised Bi-directional Cycle Domain Transfer Network (UBCDTN) and the Semantic Encoder guided Super Resolution Network (SESRN). First, the UBCDTN is able to produce an approximated real-like LR image through transferring the LR image from an artificially degraded domain to the real-world LR image domain. Second, the SESRN has the ability to super-resolve the approximated real-like LR image to a photo-realistic HR image. Extensive experiments on unpaired real-world image benchmark datasets demonstrate that the proposed method achieves superior performance compared to state-of-the-art methods.
LGFeb 19, 2023
Stationary Point Losses for Robust ModelWeiwei Gao, Dazhi Zhang, Yao Li et al.
The inability to guarantee robustness is one of the major obstacles to the application of deep learning models in security-demanding domains. We identify that the most commonly used cross-entropy (CE) loss does not guarantee robust boundary for neural networks. CE loss sharpens the neural network at the decision boundary to achieve a lower loss, rather than pushing the boundary to a more robust position. A robust boundary should be kept in the middle of samples from different classes, thus maximizing the margins from the boundary to the samples. We think this is due to the fact that CE loss has no stationary point. In this paper, we propose a family of new losses, called stationary point (SP) loss, which has at least one stationary point on the correct classification side. We proved that robust boundary can be guaranteed by SP loss without losing much accuracy. With SP loss, larger perturbations are required to generate adversarial examples. We demonstrate that robustness is improved under a variety of adversarial attacks by applying SP loss. Moreover, robust boundary learned by SP loss also performs well on imbalanced datasets.
18.2CVMay 23
SILSM: A Sustainable Interactive Level Set Method for Progressive RefinementJiachen Song, Dazhi Zhang, Fanghui Song et al.
Interactive segmentation aims to precisely isolate target objects using sparse user guidance. However, traditional methods often suffer from heavy interaction burdens and parameter sensitivity, while deep learning approaches struggle with data dependency and iterative instability. Motivated by these limitations, we propose the Sustainable Interactive Level Set Method (SILSM). The proposed level set evolution equation incorporates interaction, regularization, and segmentation terms. Specifically, high-order regularization is employed to maintain numerical stability, and unlike traditional methods, we decouple user guidance into an independent interaction term to enable direct manual control over the zero-level set evolution. Furthermore, we develop a numerical algorithm tailored for multiple interactions, which facilitates dynamic refinement by effectively updating the segmentation results based on sequential user inputs. We theoretically demonstrate that the high-order term provides stronger regularization constraints than the conventional length term, while the interaction term ensures segmentation strictly within the user-selected region. Experimental results further demonstrate that the proposed method is robust to interactive inputs, achieves competitive performance at the first interaction, and supports stable multi-round interactions with progressively improved segmentation quality.
CVOct 13, 2023
Re-initialization-free Level Set Method via Molecular Beam Epitaxy Equation Regularization for Image SegmentationFanghui Song, Jiebao Sun, Shengzhu Shi et al.
Variational level set method has become a powerful tool in image segmentation due to its ability to handle complex topological changes and maintain continuity and smoothness in the process of evolution. However its evolution process can be unstable, which results in over flatted or over sharpened contours and segmentation failure. To improve the accuracy and stability of evolution, we propose a high-order level set variational segmentation method integrated with molecular beam epitaxy (MBE) equation regularization. This method uses the crystal growth in the MBE process to limit the evolution of the level set function, and thus can avoid the re-initialization in the evolution process and regulate the smoothness of the segmented curve. It also works for noisy images with intensity inhomogeneity, which is a challenge in image segmentation. To solve the variational model, we derive the gradient flow and design scalar auxiliary variable (SAV) scheme coupled with fast Fourier transform (FFT), which can significantly improve the computational efficiency compared with the traditional semi-implicit and semi-explicit scheme. Numerical experiments show that the proposed method can generate smooth segmentation curves, retain fine segmentation targets and obtain robust segmentation results of small objects. Compared to existing level set methods, this model is state-of-the-art in both accuracy and efficiency.
LGOct 18, 2023
PINNsFailureRegion Localization and Refinement through White-box AdversarialAttackShengzhu Shi, Yao Li, Zhichang Guo et al.
Physics-informed neural networks (PINNs) have shown great promise in solving partial differential equations (PDEs). However, vanilla PINNs often face challenges when solving complex PDEs, especially those involving multi-scale behaviors or solutions with sharp or oscillatory characteristics. To precisely and adaptively locate the critical regions that fail in the solving process we propose a sampling strategy grounded in white-box adversarial attacks, referred to as WbAR. WbAR search for failure regions in the direction of the loss gradient, thus directly locating the most critical positions. WbAR generates adversarial samples in a random walk manner and iteratively refines PINNs to guide the model's focus towards dynamically updated critical regions during training. We implement WbAR to the elliptic equation with multi-scale coefficients, Poisson equation with multi-peak solutions, high-dimensional Poisson equations, and Burgers equation with sharp solutions. The results demonstrate that WbAR can effectively locate and reduce failure regions. Moreover, WbAR is suitable for solving complex PDEs, since locating failure regions through adversarial attacks is independent of the size of failure regions or the complexity of the distribution.
LGNov 12, 2025
PDAC: Efficient Coreset Selection for Continual Learning via Probability Density AwarenessJunqi Gao, Zhichang Guo, Dazhi Zhang et al.
Rehearsal-based Continual Learning (CL) maintains a limited memory buffer to store replay samples for knowledge retention, making these approaches heavily reliant on the quality of the stored samples. Current Rehearsal-based CL methods typically construct the memory buffer by selecting a representative subset (referred to as coresets), aiming to approximate the training efficacy of the full dataset with minimal storage overhead. However, mainstream Coreset Selection (CS) methods generally formulate the CS problem as a bi-level optimization problem that relies on numerous inner and outer iterations to solve, leading to substantial computational cost thus limiting their practical efficiency. In this paper, we aim to provide a more efficient selection logic and scheme for coreset construction. To this end, we first analyze the Mean Squared Error (MSE) between the buffer-trained model and the Bayes-optimal model through the perspective of localized error decomposition to investigate the contribution of samples from different regions to MSE suppression. Further theoretical and experimental analyses demonstrate that samples with high probability density play a dominant role in error suppression. Inspired by this, we propose the Probability Density-Aware Coreset (PDAC) method. PDAC leverages the Projected Gaussian Mixture (PGM) model to estimate each sample's joint density, enabling efficient density-prioritized buffer selection. Finally, we introduce the streaming Expectation Maximization (EM) algorithm to enhance the adaptability of PGM parameters to streaming data, yielding Streaming PDAC (SPDAC) for streaming scenarios. Extensive comparative experiments show that our methods outperforms other baselines across various CL settings while ensuring favorable efficiency.
IVApr 11, 2024Code
Diffusion Probabilistic Multi-cue Level Set for Reducing Edge Uncertainty in Pancreas SegmentationYue Gou, Yuming Xing, Shengzhu Shi et al.
Accurately segmenting the pancreas remains a huge challenge. Traditional methods encounter difficulties in semantic localization due to the small volume and distorted structure of the pancreas, while deep learning methods encounter challenges in obtaining accurate edges because of low contrast and organ overlapping. To overcome these issues, we propose a multi-cue level set method based on the diffusion probabilistic model, namely Diff-mcs. Our method adopts a coarse-to-fine segmentation strategy. We use the diffusion probabilistic model in the coarse segmentation stage, with the obtained probability distribution serving as both the initial localization and prior cues for the level set method. In the fine segmentation stage, we combine the prior cues with grayscale cues and texture cues to refine the edge by maximizing the difference between probability distributions of the cues inside and outside the level set curve. The method is validated on three public datasets and achieves state-of-the-art performance, which can obtain more accurate segmentation results with lower uncertainty segmentation edges. In addition, we conduct ablation studies and uncertainty analysis to verify that the diffusion probability model provides a more appropriate initialization for the level set method. Furthermore, when combined with multiple cues, the level set method can better obtain edges and improve the overall accuracy. Our code is available at https://github.com/GOUYUEE/Diff-mcs.
LGJun 4, 2025Code
Bohdi: Heterogeneous LLM Fusion with Automatic Data ExplorationJunqi Gao, Zhichang Guo, Dazhi Zhang et al.
Heterogeneous Large Language Model (LLM) fusion integrates the strengths of multiple source LLMs with different architectures into a target LLM with low computational overhead. While promising, existing methods suffer from two major limitations: 1) reliance on real data from limited domain for knowledge fusion, preventing the target LLM from fully acquiring knowledge across diverse domains, and 2) fixed data allocation proportions across domains, failing to dynamically adjust according to the target LLM's varying capabilities across domains, leading to a capability imbalance. To overcome these limitations, we propose Bohdi, a synthetic-data-only heterogeneous LLM fusion framework. Through the organization of knowledge domains into a hierarchical tree structure, Bohdi enables automatic domain exploration and multi-domain data generation through multi-model collaboration, thereby comprehensively extracting knowledge from source LLMs. By formalizing domain expansion and data sampling proportion allocation on the knowledge tree as a Hierarchical Multi-Armed Bandit problem, Bohdi leverages the designed DynaBranches mechanism to adaptively adjust sampling proportions based on the target LLM's performance feedback across domains. Integrated with our proposed Introspection-Rebirth (IR) mechanism, DynaBranches dynamically tracks capability shifts during target LLM's updates via Sliding Window Binomial Likelihood Ratio Testing (SWBLRT), further enhancing its online adaptation capability. Comparative experimental results on a comprehensive suite of benchmarks demonstrate that Bohdi significantly outperforms existing baselines on multiple target LLMs, exhibits higher data efficiency, and virtually eliminates the imbalance in the target LLM's capabilities. Our code is available at https://github.com/gjq100/Bohdi.git.
LGJun 8, 2024Code
Perturbation Towards Easy Samples Improves Targeted Adversarial TransferabilityJunqi Gao, Biqing Qi, Yao Li et al.
The transferability of adversarial perturbations provides an effective shortcut for black-box attacks. Targeted perturbations have greater practicality but are more difficult to transfer between models. In this paper, we experimentally and theoretically demonstrated that neural networks trained on the same dataset have more consistent performance in High-Sample-Density-Regions (HSDR) of each class instead of low sample density regions. Therefore, in the target setting, adding perturbations towards HSDR of the target class is more effective in improving transferability. However, density estimation is challenging in high-dimensional scenarios. Further theoretical and experimental verification demonstrates that easy samples with low loss are more likely to be located in HSDR. Perturbations towards such easy samples in the target class can avoid density estimation for HSDR location. Based on the above facts, we verified that adding perturbations to easy samples in the target class improves targeted adversarial transferability of existing attack methods. A generative targeted attack strategy named Easy Sample Matching Attack (ESMA) is proposed, which has a higher success rate for targeted attacks and outperforms the SOTA generative method. Moreover, ESMA requires only 5% of the storage space and much less computation time comparing to the current SOTA, as ESMA attacks all classes with only one model instead of seperate models for each class. Our code is available at https://github.com/gjq100/ESMA.
CVNov 8, 2025
Towards Frequency-Adaptive Learning for SAR DespecklingZiqing Ma, Chang Yang, Zhichang Guo et al.
Synthetic Aperture Radar (SAR) images are inherently corrupted by speckle noise, limiting their utility in high-precision applications. While deep learning methods have shown promise in SAR despeckling, most methods employ a single unified network to process the entire image, failing to account for the distinct speckle statistics associated with different spatial physical characteristics. It often leads to artifacts, blurred edges, and texture distortion. To address these issues, we propose SAR-FAH, a frequency-adaptive heterogeneous despeckling model based on a divide-and-conquer architecture. First, wavelet decomposition is used to separate the image into frequency sub-bands carrying different intrinsic characteristics. Inspired by their differing noise characteristics, we design specialized sub-networks for different frequency components. The tailored approach leverages statistical variations across frequencies, improving edge and texture preservation while suppressing noise. Specifically, for the low-frequency part, denoising is formulated as a continuous dynamic system via neural ordinary differential equations, ensuring structural fidelity and sufficient smoothness that prevents artifacts. For high-frequency sub-bands rich in edges and textures, we introduce an enhanced U-Net with deformable convolutions for noise suppression and enhanced features. Extensive experiments on synthetic and real SAR images validate the superior performance of the proposed model in noise removal and structural preservation.
CVNov 2, 2025
TA-LSDiff:Topology-Aware Diffusion Guided by a Level Set Energy for Pancreas SegmentationYue Gou, Fanghui Song, Yuming Xing et al.
Pancreas segmentation in medical image processing is a persistent challenge due to its small size, low contrast against adjacent tissues, and significant topological variations. Traditional level set methods drive boundary evolution using gradient flows, often ignoring pointwise topological effects. Conversely, deep learning-based segmentation networks extract rich semantic features but frequently sacrifice structural details. To bridge this gap, we propose a novel model named TA-LSDiff, which combined topology-aware diffusion probabilistic model and level set energy, achieving segmentation without explicit geometric evolution. This energy function guides implicit curve evolution by integrating the input image and deep features through four complementary terms. To further enhance boundary precision, we introduce a pixel-adaptive refinement module that locally modulates the energy function using affinity weighting from neighboring evidence. Ablation studies systematically quantify the contribution of each proposed component. Evaluations on four public pancreas datasets demonstrate that TA-LSDiff achieves state-of-the-art accuracy, outperforming existing methods. These results establish TA-LSDiff as a practical and accurate solution for pancreas segmentation.
CVNov 12, 2025
An ICTM-RMSAV Framework for Bias-Field Aware Image Segmentation under Poisson and Multiplicative NoiseXinyu Wang, Wenjun Yao, Fanghui Song et al.
Image segmentation is a core task in image processing, yet many methods degrade when images are heavily corrupted by noise and exhibit intensity inhomogeneity. Within the iterative-convolution thresholding method (ICTM) framework, we propose a variational segmentation model that integrates denoising terms. Specifically, the denoising component consists of an I-divergence term and an adaptive total-variation (TV) regularizer, making the model well suited to images contaminated by Gamma--distributed multiplicative noise and Poisson noise. A spatially adaptive weight derived from a gray-level indicator guides diffusion differently across regions of varying intensity. To further address intensity inhomogeneity, we estimate a smoothly varying bias field, which improves segmentation accuracy. Regions are represented by characteristic functions, with contour length encoded accordingly. For efficient optimization, we couple ICTM with a relaxed modified scalar auxiliary variable (RMSAV) scheme. Extensive experiments on synthetic and real-world images with intensity inhomogeneity and diverse noise types show that the proposed model achieves superior accuracy and robustness compared with competing approaches.
88.3LGApr 30
Auto-FlexSwitch: Efficient Dynamic Model Merging via Learnable Task Vector CompressionJunqi Gao, Dazhi Zhang, Zhichang Guo et al.
Model merging has attracted attention as an effective path toward multi-task adaptation by integrating knowledge from multiple task-specific models. Among existing approaches, dynamic merging mitigates performance degradation caused by conflicting parameter updates across tasks by flexibly combining task-specific parameters at inference time, thereby maintaining high performance. However, these methods require storing independent parameters for each task, resulting in prohibitive storage overhead. To address this issue, we first experimentally demonstrate that the fine-tuned weight increments (referred to as task vectors) exhibit an impulse-like activation pattern and high robustness to low-bit representations. Driven by this insight, we propose T-Switch, which decomposes task vectors into three compact components: a binary sparse mask, a sign vector, and a scalar scaling factor, achieving high-fidelity approximation at high compression ratios. We then introduce Auto-Switch, a training-free merging scheme that automatically composes task vectors via feature similarity retrieval. Building on this, we develop Auto-Switch, a training-free merging scheme that automatically assembles task vectors through feature similarity retrieval. Furthermore, to transform task vector sparsification and quantization from static rules to adaptive learning, we propose FlexSwitch, a learnable framework which jointly optimizes the compression strategy for each model unit via Learnable Gating Sparsification (LGS) and Bit-width Adaptive Selection (BAS), while employing the Sparsity-Aware Storage Strategy (SASS) to select the optimal storage encoding structure. Finally, by incorporating a K-Nearest Neighbor (KNN) inference scheme with a learnable low-rank metric, we present Auto-FlexSwitch, a dynamic model merging approach that supports highly efficient task vector compression.
CVJun 25, 2025
Progressive Alignment Degradation Learning for PansharpeningEnzhe Zhao, Zhichang Guo, Yao Li et al.
Deep learning-based pansharpening has been shown to effectively generate high-resolution multispectral (HRMS) images. To create supervised ground-truth HRMS images, synthetic data generated using the Wald protocol is commonly employed. This protocol assumes that networks trained on artificial low-resolution data will perform equally well on high-resolution data. However, well-trained models typically exhibit a trade-off in performance between reduced-resolution and full-resolution datasets. In this paper, we delve into the Wald protocol and find that its inaccurate approximation of real-world degradation patterns limits the generalization of deep pansharpening models. To address this issue, we propose the Progressive Alignment Degradation Module (PADM), which uses mutual iteration between two sub-networks, PAlignNet and PDegradeNet, to adaptively learn accurate degradation processes without relying on predefined operators. Building on this, we introduce HFreqdiff, which embeds high-frequency details into a diffusion framework and incorporates CFB and BACM modules for frequency-selective detail extraction and precise reverse process learning. These innovations enable effective integration of high-resolution panchromatic and multispectral images, significantly enhancing spatial sharpness and quality. Experiments and ablation studies demonstrate the proposed method's superior performance compared to state-of-the-art techniques.
CVDec 21, 2024
Mixed geometry information regularization for image multiplicative denoisingShengkun Yang, Zhichang Guo, Jia Li et al.
This paper focuses on solving the multiplicative gamma denoising problem via a variation model. Variation-based regularization models have been extensively employed in a variety of inverse problem tasks in image processing. However, sufficient geometric priors and efficient algorithms are still very difficult problems in the model design process. To overcome these issues, in this paper we propose a mixed geometry information model, incorporating area term and curvature term as prior knowledge. In addition to its ability to effectively remove multiplicative noise, our model is able to preserve edges and prevent staircasing effects. Meanwhile, to address the challenges stemming from the nonlinearity and non-convexity inherent in higher-order regularization, we propose the efficient additive operator splitting algorithm (AOS) and scalar auxiliary variable algorithm (SAV). The unconditional stability possessed by these algorithms enables us to use large time step. And the SAV method shows higher computational accuracy in our model. We employ the second order SAV algorithm to further speed up the calculation while maintaining accuracy. We demonstrate the effectiveness and efficiency of the model and algorithms by a lot of numerical experiments, where the model we proposed has better features texturepreserving properties without generating any false information.
CVDec 8, 2024
Adversarial Transferability in Deep Denoising Models: Theoretical Insights and Robustness Enhancement via Out-of-Distribution Typical Set SamplingJie Ning, Jiebao Sun, Shengzhu Shi et al.
Deep learning-based image denoising models demonstrate remarkable performance, but their lack of robustness analysis remains a significant concern. A major issue is that these models are susceptible to adversarial attacks, where small, carefully crafted perturbations to input data can cause them to fail. Surprisingly, perturbations specifically crafted for one model can easily transfer across various models, including CNNs, Transformers, unfolding models, and plug-and-play models, leading to failures in those models as well. Such high adversarial transferability is not observed in classification models. We analyze the possible underlying reasons behind the high adversarial transferability through a series of hypotheses and validation experiments. By characterizing the manifolds of Gaussian noise and adversarial perturbations using the concept of typical set and the asymptotic equipartition property, we prove that adversarial samples deviate slightly from the typical set of the original input distribution, causing the models to fail. Based on these insights, we propose a novel adversarial defense method: the Out-of-Distribution Typical Set Sampling Training strategy (TS). TS not only significantly enhances the model's robustness but also marginally improves denoising performance compared to the original model.
CVNov 24, 2024
A Tunable Despeckling Neural Network Stabilized via Diffusion EquationYi Ran, Zhichang Guo, Jia Li et al.
The removal of multiplicative Gamma noise is a critical research area in the application of synthetic aperture radar (SAR) imaging, where neural networks serve as a potent tool. However, real-world data often diverges from theoretical models, exhibiting various disturbances, which makes the neural network less effective. Adversarial attacks can be used as a criterion for judging the adaptability of neural networks to real data, since adversarial attacks can find the most extreme perturbations that make neural networks ineffective. In this work, the diffusion equation is designed as a regularization block to provide sufficient regularity to the whole neural network, due to its spontaneous dissipative nature. We propose a tunable, regularized neural network framework that unrolls a shallow denoising neural network block and a diffusion regularity block into a single network for end-to-end training. The linear heat equation, known for its inherent smoothness and low-pass filtering properties, is adopted as the diffusion regularization block. In our model, a single time step hyperparameter governs the smoothness of the outputs and can be adjusted dynamically, significantly enhancing flexibility. The stability and convergence of our model are theoretically proven. Experimental results demonstrate that the proposed model effectively eliminates high-frequency oscillations induced by adversarial attacks. Finally, the proposed model is benchmarked against several state-of-the-art denoising methods on simulated images, adversarial samples, and real SAR images, achieving superior performance in both quantitative and visual evaluations.