LG AI CRSep 12, 2024

LoRID: Low-Rank Iterative Diffusion for Adversarial Purification

Geigh Zollicoffer, Minh Vu, Ben Nebgen, Juan Castorena, Boian Alexandrov, Manish Bhattarai

arXiv:2409.08255v112.510 citationsh-index: 25

Originality Incremental advance

AI Analysis

It addresses the critical issue of adversarial defense for AI systems, offering an incremental improvement over existing diffusion-based purification methods.

This paper tackles the problem of adversarial attacks on machine learning models by introducing LoRID, a low-rank iterative diffusion method for adversarial purification, which achieves superior robustness performance on datasets like CIFAR-10/100, CelebA-HQ, and ImageNet under white-box and black-box settings.

This work presents an information-theoretic examination of diffusion-based purification methods, the state-of-the-art adversarial defenses that utilize diffusion models to remove malicious perturbations in adversarial examples. By theoretically characterizing the inherent purification errors associated with the Markov-based diffusion purifications, we introduce LoRID, a novel Low-Rank Iterative Diffusion purification method designed to remove adversarial perturbation with low intrinsic purification errors. LoRID centers around a multi-stage purification process that leverages multiple rounds of diffusion-denoising loops at the early time-steps of the diffusion models, and the integration of Tucker decomposition, an extension of matrix factorization, to remove adversarial noise at high-noise regimes. Consequently, LoRID increases the effective diffusion time-steps and overcomes strong adversarial attacks, achieving superior robustness performance in CIFAR-10/100, CelebA-HQ, and ImageNet datasets under both white-box and black-box settings.

View on arXiv PDF

Similar