LG AI CRJul 13, 2023

MF-CLIP: Leveraging CLIP as Surrogate Models for No-box Adversarial Attacks

Jiaming Zhang, Lingyu Qiu, Qi Yi, Yige Li, Jitao Sang, Changsheng Xu, Dit-Yan Yeung

arXiv:2307.06608v33.81 citationsh-index: 32

Originality Highly original

AI Analysis

This addresses the vulnerability of deep neural networks in safety-critical applications by improving no-box attacks, which are practical but underexplored, representing a strong specific gain in the domain of adversarial machine learning.

The paper tackles the problem of no-box adversarial attacks, where attackers lack prior knowledge of the target model, by proposing MF-CLIP, a framework that enhances CLIP's effectiveness as a surrogate model through margin-aware feature space optimization, resulting in a 15.23% improvement over baselines on standard models and a 9.52% improvement on adversarially trained models.

The vulnerability of Deep Neural Networks (DNNs) to adversarial attacks poses a significant challenge to their deployment in safety-critical applications. While extensive research has addressed various attack scenarios, the no-box attack setting where adversaries have no prior knowledge, including access to training data of the target model, remains relatively underexplored despite its practical relevance. This work presents a systematic investigation into leveraging large-scale Vision-Language Models (VLMs), particularly CLIP, as surrogate models for executing no-box attacks. Our theoretical and empirical analyses reveal a key limitation in the execution of no-box attacks stemming from insufficient discriminative capabilities for direct application of vanilla CLIP as a surrogate model. To address this limitation, we propose MF-CLIP: a novel framework that enhances CLIP's effectiveness as a surrogate model through margin-aware feature space optimization. Comprehensive evaluations across diverse architectures and datasets demonstrate that MF-CLIP substantially advances the state-of-the-art in no-box attacks, surpassing existing baselines by 15.23% on standard models and achieving a 9.52% improvement on adversarially trained models. Our code will be made publicly available to facilitate reproducibility and future research in this direction.

View on arXiv PDF

Similar