Revisiting Mixout: An Overlooked Path to Robust Finetuning
This work addresses robustness issues in finetuning for vision models, offering a practical solution for researchers and practitioners dealing with distribution shifts, though it is incremental as it builds on existing Mixout regularization.
The paper tackled the problem of finetuning vision foundation models losing robustness under distribution shift by revisiting Mixout and introducing GMixout, which improved in-domain accuracy beyond zero-shot performance and surpassed baselines like Model Soups and parameter-efficient finetuning on benchmarks such as ImageNet, DomainNet, and CIFAR100-C.
Finetuning vision foundation models often improves in-domain accuracy but comes at the cost of robustness under distribution shift. We revisit Mixout, a stochastic regularizer that intermittently replaces finetuned weights with their pretrained reference, through the lens of a single-run, weight-sharing implicit ensemble. This perspective reveals three key levers that govern robustness: the \emph{masking anchor}, \emph{resampling frequency}, and \emph{mask sparsity}. Guided by this analysis, we introduce GMixout, which (i) replaces the fixed anchor with an exponential moving-average snapshot that adapts during training, and (ii) regulates masking period via an explicit resampling-frequency hyperparameter. Our sparse-kernel implementation updates only a small fraction of parameters with no inference-time overhead, enabling training on consumer-grade GPUs. Experiments on benchmarks covering covariate shift, corruption, and class imbalance, ImageNet / ImageNet-LT, DomainNet, iWildCam, and CIFAR100-C, GMixout consistently improves in-domain accuracy beyond zero-shot performance while surpassing both Model Soups and strong parameter-efficient finetuning baselines under distribution shift.