LG AIAug 28, 2023

DiffSmooth: Certifiably Robust Learning via Diffusion Models and Local Smoothing

Jiawei Zhang, Zhongzhu Chen, Huan Zhang, Chaowei Xiao, Bo Li

arXiv:2308.14333v120.433 citationsh-index: 70Has Code

Originality Incremental advance

AI Analysis

This work addresses the critical issue of adversarial robustness in machine learning models, particularly for image classification tasks, by proposing a novel pipeline that enhances certified defenses, though it builds incrementally on existing methods like diffusion models and smoothing.

The paper tackles the problem of improving certified robustness against adversarial attacks by combining diffusion models for adversarial purification with local smoothing, achieving state-of-the-art certified accuracy, such as increasing it from 36.0% to 53.0% under an ℓ₂ radius of 1.5 on ImageNet.

Diffusion models have been leveraged to perform adversarial purification and thus provide both empirical and certified robustness for a standard model. On the other hand, different robustly trained smoothed models have been studied to improve the certified robustness. Thus, it raises a natural question: Can diffusion model be used to achieve improved certified robustness on those robustly trained smoothed models? In this work, we first theoretically show that recovered instances by diffusion models are in the bounded neighborhood of the original instance with high probability; and the "one-shot" denoising diffusion probabilistic models (DDPM) can approximate the mean of the generated distribution of a continuous-time diffusion model, which approximates the original instance under mild conditions. Inspired by our analysis, we propose a certifiably robust pipeline DiffSmooth, which first performs adversarial purification via diffusion models and then maps the purified instances to a common region via a simple yet effective local smoothing strategy. We conduct extensive experiments on different datasets and show that DiffSmooth achieves SOTA-certified robustness compared with eight baselines. For instance, DiffSmooth improves the SOTA-certified accuracy from $36.0\%$ to $53.0\%$ under $\ell_2$ radius $1.5$ on ImageNet. The code is available at [https://github.com/javyduck/DiffSmooth].

View on arXiv PDF Code

Similar