LTD: Low Temperature Distillation for Gradient Masking-free Adversarial Training
This work improves adversarial robustness for image classification models by refining label assumptions, offering a domain-specific solution to a known bottleneck.
The paper tackles the vulnerability of neural networks in adversarial training by addressing the imprecision of one-hot labels due to data ambiguity, introducing Low-Temperature Distillation (LTD) to refine label representations and avoid gradient masking, achieving robust accuracy rates of 58.19% on CIFAR-10, 31.13% on CIFAR-100, and 42.08% on ImageNet without extra data.
Adversarial training is a widely adopted strategy to bolster the robustness of neural network models against adversarial attacks. This paper revisits the fundamental assumptions underlying image classification and suggests that representing data as one-hot labels is a key factor that leads to vulnerabilities. However, in real-world datasets, data ambiguity often arises, with samples exhibiting characteristics of multiple classes, rendering one-hot label representations imprecise. To address this, we introduce a novel approach, Low-Temperature Distillation (LTD), designed to refine label representations. Unlike previous approaches, LTD incorporates a relatively low temperature in the teacher model, while maintaining a fixed temperature for the student model during both training and inference. This strategy not only refines assumptions about data distribution but also strengthens model robustness and avoids the gradient masking problem commonly encountered in defensive distillation. Experimental results demonstrate the efficacy of the proposed method when combined with existing frameworks, achieving robust accuracy rates of 58.19%, 31.13%, and 42.08% on the CIFAR-10, CIFAR-100, and ImageNet datasets, respectively, without the need for additional data.