LGAICLOCMLFeb 27, 2024

Variational Learning is Effective for Large Deep Networks

arXiv:2402.17641v257 citationsh-index: 21ICML
AI Analysis

This challenges a common assumption in machine learning, potentially improving training and uncertainty estimation for large-scale models, though it appears incremental as it builds on existing variational methods.

The paper tackles the belief that variational learning is ineffective for large neural networks by showing that the Improved Variational Online Newton (IVON) optimizer matches or outperforms Adam in training models like GPT-2 and ResNets, with better predictive uncertainty and new use cases in finetuning, model merging, generalization error prediction, and data sensitivity estimation.

We give extensive empirical evidence against the common belief that variational learning is ineffective for large neural networks. We show that an optimizer called Improved Variational Online Newton (IVON) consistently matches or outperforms Adam for training large networks such as GPT-2 and ResNets from scratch. IVON's computational costs are nearly identical to Adam but its predictive uncertainty is better. We show several new use cases of IVON where we improve finetuning and model merging in Large Language Models, accurately predict generalization error, and faithfully estimate sensitivity to data. We find overwhelming evidence that variational learning is effective.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes