MLLGJun 6, 2023

Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels

arXiv:2306.03968v117 citationsh-index: 169Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of inefficient hyperparameter tuning for deep learning practitioners by providing a more scalable method, though it is incremental as it builds on existing Laplace approximation techniques.

The paper tackles the scalability issue of Bayesian hyperparameter optimization in deep learning by introducing lower bounds to the linearized Laplace approximation, enabling stochastic-gradient-based optimization and trading off accuracy for computational efficiency. The result is a significant acceleration of gradient-based hyperparameter optimization, as demonstrated experimentally.

Selecting hyperparameters in deep learning greatly impacts its effectiveness but requires manual effort and expertise. Recent works show that Bayesian model selection with Laplace approximations can allow to optimize such hyperparameters just like standard neural network parameters using gradients and on the training data. However, estimating a single hyperparameter gradient requires a pass through the entire dataset, limiting the scalability of such algorithms. In this work, we overcome this issue by introducing lower bounds to the linearized Laplace approximation of the marginal likelihood. In contrast to previous estimators, these bounds are amenable to stochastic-gradient-based optimization and allow to trade off estimation accuracy against computational complexity. We derive them using the function-space form of the linearized Laplace, which can be estimated using the neural tangent kernel. Experimentally, we show that the estimators can significantly accelerate gradient-based hyperparameter optimization.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes