DAFT: Distilling Adversarially Fine-tuned Models for Better OOD Generalization
This addresses the fragility of deep learning models to distribution shifts, offering improved OOD generalization for applications requiring reliable performance in varied environments, though it appears incremental as it builds on existing adversarial training and distillation techniques.
The paper tackles the problem of out-of-distribution (OOD) generalization in deep learning by proposing DAFT, a method that distills knowledge from adversarially fine-tuned teachers to improve robustness, achieving up to 6% accuracy gains over state-of-the-art methods on standard benchmarks.
We consider the problem of OOD generalization, where the goal is to train a model that performs well on test distributions that are different from the training distribution. Deep learning models are known to be fragile to such shifts and can suffer large accuracy drops even for slightly different test distributions. We propose a new method - DAFT - based on the intuition that adversarially robust combination of a large number of rich features should provide OOD robustness. Our method carefully distills the knowledge from a powerful teacher that learns several discriminative features using standard training while combining them using adversarial training. The standard adversarial training procedure is modified to produce teachers which can guide the student better. We evaluate DAFT on standard benchmarks in the DomainBed framework, and demonstrate that DAFT achieves significant improvements over the current state-of-the-art OOD generalization methods. DAFT consistently out-performs well-tuned ERM and distillation baselines by up to 6%, with more pronounced gains for smaller networks.