LGAICLCVFeb 21, 2025

Directional Gradient Projection for Robust Fine-Tuning of Foundation Models

arXiv:2502.15895v211 citationsh-index: 48ICLR
Originality Incremental advance
AI Analysis

This work addresses the problem of adapting large models to downstream tasks while maintaining robustness to distribution shifts, which is crucial for real-world AI applications, though it appears incremental by building on existing regularization approaches.

The paper tackles robust fine-tuning of foundation models by proposing Directional Gradient Projection (DiGraP), a layer-wise method that uses gradient direction to improve regularization and optimization, resulting in consistent performance gains over baselines in image classification and visual question answering tasks for both in-distribution and out-of-distribution robustness.

Robust fine-tuning aims to adapt large foundation models to downstream tasks while preserving their robustness to distribution shifts. Existing methods primarily focus on constraining and projecting current model towards the pre-trained initialization based on the magnitudes between fine-tuned and pre-trained weights, which often require extensive hyper-parameter tuning and can sometimes result in underfitting. In this work, we propose Directional Gradient Projection (DiGraP), a novel layer-wise trainable method that incorporates directional information from gradients to bridge regularization and multi-objective optimization. Besides demonstrating our method on image classification, as another contribution we generalize this area to the multi-modal evaluation settings for robust fine-tuning. Specifically, we first bridge the uni-modal and multi-modal gap by performing analysis on Image Classification reformulated Visual Question Answering (VQA) benchmarks and further categorize ten out-of-distribution (OOD) VQA datasets by distribution shift types and degree (i.e. near versus far OOD). Experimental results show that DiGraP consistently outperforms existing baselines across Image Classfication and VQA tasks with discriminative and generative backbones, improving both in-distribution (ID) generalization and OOD robustness.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes