CLAILGJun 20, 2024

Information Guided Regularization for Fine-tuning Language Models

arXiv:2406.14005v21 citations
AI Analysis

This work addresses the challenge of smoother transfer learning for language model practitioners, though it is incremental as it builds on existing fine-tuning methods.

The paper tackled the problem of improving regularization during fine-tuning of language models by proposing guided dropout, which leverages insights from the pretraining loss landscape to enhance downstream generalization, achieving consistently better performance even with limited data compared to standard baselines.

The pretraining-fine-tuning paradigm has been the de facto strategy for transfer learning in modern language modeling. With the understanding that task adaptation in LMs is often a function of parameters shared across tasks, we argue that a more surgical approach to regularization needs to exist for smoother transfer learning. Towards this end, we investigate how the pretraining loss landscape is affected by these task-sensitive parameters through an information-theoretic lens. We then leverage the findings from our investigations to devise a novel approach to dropout for improved model regularization and better downstream generalization. This approach, named guided dropout, is both task & architecture agnostic and adds no computational overhead to the fine-tuning process. Through empirical evaluations, we showcase that our approach to regularization yields consistently better performance, even in scenarios of data paucity, compared to standardized baselines.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes