Chengjin Sun

LG
6papers
172citations
Novelty57%
AI Score26

6 Papers

CVMar 4, 2020
Double Backpropagation for Training Autoencoders against Adversarial Attack

Chengjin Sun, Sizhe Chen, Xiaolin Huang

Deep learning, as widely known, is vulnerable to adversarial samples. This paper focuses on the adversarial attack on autoencoders. Safety of the autoencoders (AEs) is important because they are widely used as a compression scheme for data storage and transmission, however, the current autoencoders are easily attacked, i.e., one can slightly modify an input but has totally different codes. The vulnerability is rooted the sensitivity of the autoencoders and to enhance the robustness, we propose to adopt double backpropagation (DBP) to secure autoencoder such as VAE and DRAW. We restrict the gradient from the reconstruction image to the original one so that the autoencoder is not sensitive to trivial perturbation produced by the adversarial attack. After smoothing the gradient by DBP, we further smooth the label by Gaussian Mixture Model (GMM), aiming for accurate and robust classification. We demonstrate in MNIST, CelebA, SVHN that our method leads to a robust autoencoder resistant to attack and a robust classifier able for image transition and immune to adversarial attack if combined with GMM.

CVMar 4, 2020
Type I Attack for Generative Models

Chengjin Sun, Sizhe Chen, Jia Cai et al.

Generative models are popular tools with a wide range of applications. Nevertheless, it is as vulnerable to adversarial samples as classifiers. The existing attack methods mainly focus on generating adversarial examples by adding imperceptible perturbations to input, which leads to wrong result. However, we focus on another aspect of attack, i.e., cheating models by significant changes. The former induces Type II error and the latter causes Type I error. In this paper, we propose Type I attack to generative models such as VAE and GAN. One example given in VAE is that we can change an original image significantly to a meaningless one but their reconstruction results are similar. To implement the Type I attack, we destroy the original one by increasing the distance in input space while keeping the output similar because different inputs may correspond to similar features for the property of deep neural network. Experimental results show that our attack method is effective to generate Type I adversarial examples for generative models on large-scale image datasets.

LGJan 21, 2020
HRFA: High-Resolution Feature-based Attack

Zhixing Ye, Sizhe Chen, Peidong Zhang et al.

Adversarial attacks have long been developed for revealing the vulnerability of Deep Neural Networks (DNNs) by adding imperceptible perturbations to the input. Most methods generate perturbations like normal noise, which is not interpretable and without semantic meaning. In this paper, we propose High-Resolution Feature-based Attack (HRFA), yielding authentic adversarial examples with up to $1024 \times 1024$ resolution. HRFA exerts attack by modifying the latent feature representation of the image, i.e., the gradients back propagate not only through the victim DNN, but also through the generative model that maps the feature space to the image space. In this way, HRFA generates adversarial examples that are in high-resolution, realistic, noise-free, and hence is able to evade several denoising-based defenses. In the experiment, the effectiveness of HRFA is validated by attacking the object classification and face verification tasks with BigGAN and StyleGAN, respectively. The advantages of HRFA are verified from the high quality, high authenticity, and high attack success rate faced with defenses.

LGJan 16, 2020
Universal Adversarial Attack on Attention and the Resulting Dataset DAmageNet

Sizhe Chen, Zhengbao He, Chengjin Sun et al.

Adversarial attacks on deep neural networks (DNNs) have been found for several years. However, the existing adversarial attacks have high success rates only when the information of the victim DNN is well-known or could be estimated by the structure similarity or massive queries. In this paper, we propose to Attack on Attention (AoA), a semantic property commonly shared by DNNs. AoA enjoys a significant increase in transferability when the traditional cross entropy loss is replaced with the attention loss. Since AoA alters the loss function only, it could be easily combined with other transferability-enhancement techniques and then achieve SOTA performance. We apply AoA to generate 50000 adversarial samples from ImageNet validation set to defeat many neural networks, and thus name the dataset as DAmageNet. 13 well-trained DNNs are tested on DAmageNet, and all of them have an error rate over 85%. Even with defenses or adversarial training, most models still maintain an error rate over 70% on DAmageNet. DAmageNet is the first universal adversarial dataset. It could be downloaded freely and serve as a benchmark for robustness testing and adversarial training.

LGDec 16, 2019
DAmageNet: A Universal Adversarial Dataset

Sizhe Chen, Xiaolin Huang, Zhengbao He et al.

It is now well known that deep neural networks (DNNs) are vulnerable to adversarial attack. Adversarial samples are similar to the clean ones, but are able to cheat the attacked DNN to produce incorrect predictions in high confidence. But most of the existing adversarial attacks have high success rate only when the information of the attacked DNN is well-known or could be estimated by massive queries. A promising way is to generate adversarial samples with high transferability. By this way, we generate 96020 transferable adversarial samples from original ones in ImageNet. The average difference, measured by root means squared deviation, is only around 3.8 on average. However, the adversarial samples are misclassified by various models with an error rate up to 90\%. Since the images are generated independently with the attacked DNNs, this is essentially zero-query adversarial attack. We call the dataset \emph{DAmageNet}, which is the first universal adversarial dataset that beats many models trained in ImageNet. By finding the drawbacks, DAmageNet could serve as a benchmark to study and improve robustness of DNNs. DAmageNet could be downloaded in http://www.pami.sjtu.edu.cn/Show/56/122.

LGSep 3, 2018
Adversarial Attack Type I: Cheat Classifiers by Significant Changes

Sanli Tang, Xiaolin Huang, Mingjian Chen et al.

Despite the great success of deep neural networks, the adversarial attack can cheat some well-trained classifiers by small permutations. In this paper, we propose another type of adversarial attack that can cheat classifiers by significant changes. For example, we can significantly change a face but well-trained neural networks still recognize the adversarial and the original example as the same person. Statistically, the existing adversarial attack increases Type II error and the proposed one aims at Type I error, which are hence named as Type II and Type I adversarial attack, respectively. The two types of attack are equally important but are essentially different, which are intuitively explained and numerically evaluated. To implement the proposed attack, a supervised variation autoencoder is designed and then the classifier is attacked by updating the latent variables using gradient information. {Besides, with pre-trained generative models, Type I attack on latent spaces is investigated as well.} Experimental results show that our method is practical and effective to generate Type I adversarial examples on large-scale image datasets. Most of these generated examples can pass detectors designed for defending Type II attack and the strengthening strategy is only efficient with a specific type attack, both implying that the underlying reasons for Type I and Type II attack are different.