Teach Me to Trick: Exploring Adversarial Transferability via Knowledge Distillation
This work addresses the efficiency of black-box adversarial attacks for security applications, though it is incremental as it builds on existing knowledge distillation and attack methods.
The paper tackled the problem of generating transferable adversarial examples by using knowledge distillation from multiple teacher models, achieving attack success rates comparable to ensemble methods while reducing generation time by up to six times.
We investigate whether knowledge distillation (KD) from multiple heterogeneous teacher models can enhance the generation of transferable adversarial examples. A lightweight student model is trained using two KD strategies: curriculum-based switching and joint optimization, with ResNet50 and DenseNet-161 as teachers. The trained student is then used to generate adversarial examples using FG, FGS, and PGD attacks, which are evaluated against a black-box target model (GoogLeNet). Our results show that student models distilled from multiple teachers achieve attack success rates comparable to ensemble-based baselines, while reducing adversarial example generation time by up to a factor of six. An ablation study further reveals that lower temperature settings and the inclusion of hard-label supervision significantly enhance transferability. These findings suggest that KD can serve not only as a model compression technique but also as a powerful tool for improving the efficiency and effectiveness of black-box adversarial attacks.