Scalable Linearized Laplace Approximation via Surrogate Neural Kernel
This work addresses the challenge of computing predictive uncertainty for large-scale pre-trained DNNs, offering a scalable solution that is incremental in improving LLA approximations.
The paper tackles the problem of approximating the kernel for Linearized Laplace Approximation (LLA) in a scalable way by using a surrogate deep neural network to replicate the Neural Tangent Kernel (NTK) without computing large Jacobians, resulting in similar or improved uncertainty estimation and calibration, with biasing the learned kernel significantly enhancing out-of-distribution detection.
We introduce a scalable method to approximate the kernel of the Linearized Laplace Approximation (LLA). For this, we use a surrogate deep neural network (DNN) that learns a compact feature representation whose inner product replicates the Neural Tangent Kernel (NTK). This avoids the need to compute large Jacobians. Training relies solely on efficient Jacobian-vector products, allowing to compute predictive uncertainty on large-scale pre-trained DNNs. Experimental results show similar or improved uncertainty estimation and calibration compared to existing LLA approximations. Notwithstanding, biasing the learned kernel significantly enhances out-of-distribution detection. This remarks the benefits of the proposed method for finding better kernels than the NTK in the context of LLA to compute prediction uncertainty given a pre-trained DNN.