LGMar 22, 2023
Revisiting DeepFool: generalization and improvementAlireza Abdollahpoorrostam, Mahed Abroshan, Seyed-Mohsen Moosavi-Dezfooli
Deep neural networks have been known to be vulnerable to adversarial examples, which are inputs that are modified slightly to fool the network into making incorrect predictions. This has led to a significant amount of research on evaluating the robustness of these networks against such perturbations. One particularly important robustness metric is the robustness to minimal $\ell_2$ adversarial perturbations. However, existing methods for evaluating this robustness metric are either computationally expensive or not very accurate. In this paper, we introduce a new family of adversarial attacks that strike a balance between effectiveness and computational efficiency. Our proposed attacks are generalizations of the well-known DeepFool (DF) attack, while they remain simple to understand and implement. We demonstrate that our attacks outperform existing methods in terms of both effectiveness and computational efficiency. Our proposed attacks are also suitable for evaluating the robustness of large models and can be used to perform adversarial training (AT) to achieve state-of-the-art robustness to minimal $\ell_2$ adversarial perturbations.
LGFeb 10
Model soups need only one ingredientAlireza Abdollahpoorrostam, Nikolaos Dimitriadis, Adam Hazimeh et al.
Fine-tuning large pre-trained models on a target distribution often improves in-distribution (ID) accuracy, but at the cost of out-of-distribution (OOD) robustness as representations specialize to the fine-tuning data. Weight-space ensembling methods, such as Model Soups, mitigate this effect by averaging multiple checkpoints, but they are computationally prohibitive, requiring the training and storage of dozens of fine-tuned models. In this paper, we introduce MonoSoup, a simple, data-free, hyperparameter-free, post-hoc method that achieves a strong ID-OOD balance using only a single checkpoint. Our method applies Singular Value Decomposition (SVD) to each layer's update and decomposes it into high-energy directions that capture task-specific adaptation and low-energy directions that introduce noise but may still encode residual signals useful for robustness. MonoSoup then uses entropy-based effective rank to automatically re-weigh these components with layer-wise coefficients that account for the spectral and geometric structure of the model. Experiments on CLIP models fine-tuned on ImageNet and evaluated under natural distribution shifts, as well as on Qwen language models tested on mathematical reasoning and multiple-choice benchmarks, show that this plug-and-play approach is a practical and effective alternative to multi-checkpoint methods, retaining much of their benefits without their computational overhead.
LGOct 21, 2024Code
In Search of the Successful Interpolation: On the Role of Sharpness in CLIP GeneralizationAlireza Abdollahpoorrostam
\textit{Zero-shot} models like CLIP are often fine-tuned on a target dataset to improve its accuracy further, but this can compromise out-of-distribution (OOD) robustness. Robust Fine-Tuning (\texttt{RFT} )~\citep{wortsman2021robust}, which interpolates between the \textit{zero-shot} and \textit{fine-tuned} models, has been proposed to address this issue. However, understanding when \texttt{RFT} actually improves OOD error remains limited. In this work, we empirically investigate the robustness of \texttt{RFT} in CLIP models, with a focus on the \textit{sharpness} of the CLIP model during interpolation. First, we demonstrate that while sharpness may not serve as a reliable indicator for predicting the generalization of modern architectures like CLIP on OOD data, this challenges the conventional belief in the generalization benefits of flat minima in foundation models. However, by examining the role of the \textit{straggler layer} phenomenon, we show that, unlike overall sharpness, the \textit{layer-wise} sharpness of \textit{straggler} layers can reliably capture the generalization performance of interpolated CLIP models on OOD data. Our extensive experiments reveal that \textit{layer-wise} sharpness correlates with generalization in OOD accuracy for \texttt{RFT}. Furthermore, we demonstrate that by inducing sparsity in the \textit{straggler} layers, we can mitigate the \textit{failure mode} phenomenon in \texttt{RFT}. To the best of our knowledge, this is the first work to study the role of sharpness in the \textit{success} of interpolation in the weight space of CLIP foundation models. Our code is available at \url{https://github.com/alirezaabdollahpour/CLIP_Mode_Connectivity}.