LGAIJun 26, 2025

Pay Attention to Small Weights

arXiv:2506.21374v23 citationsh-index: 16
Originality Incremental advance
AI Analysis

This addresses the problem of high resource demands in finetuning for practitioners, though it is incremental as it builds on existing parameter-efficient finetuning approaches.

The paper tackles the resource-intensive nature of finetuning large pretrained neural networks by proposing NANOADAM, a method that dynamically updates only small-magnitude weights, which reduces memory and computational costs while improving generalization performance in NLP and vision tasks.

Finetuning large pretrained neural networks is known to be resource-intensive, both in terms of memory and computational cost. To mitigate this, a common approach is to restrict training to a subset of the model parameters. By analyzing the relationship between gradients and weights during finetuning, we observe a notable pattern: large gradients are often associated with small-magnitude weights. This correlation is more pronounced in finetuning settings than in training from scratch. Motivated by this observation, we propose NANOADAM, which dynamically updates only the small-magnitude weights during finetuning and offers several practical advantages: first, this criterion is gradient-free -- the parameter subset can be determined without gradient computation; second, it preserves large-magnitude weights, which are likely to encode critical features learned during pretraining, thereby reducing the risk of catastrophic forgetting; thirdly, it permits the use of larger learning rates and consistently leads to better generalization performance in experiments. We demonstrate this for both NLP and vision tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes