Scalable Log Determinants for Gaussian Process Kernel Learning
This work addresses a critical scalability issue for practitioners in machine learning and statistics who rely on Gaussian processes and related models, offering incremental improvements to existing methods.
The paper tackles the computational bottleneck of computing log determinants and their derivatives for large positive definite matrices in Gaussian processes and other applications, proposing novel O(n) stochastic approximations based on matrix-vector multiplications that enable scalable kernel learning, with Lanczos outperforming Chebyshev and surrogate models showing high efficiency and accuracy.
For applications as varied as Bayesian neural networks, determinantal point processes, elliptical graphical models, and kernel learning for Gaussian processes (GPs), one must compute a log determinant of an $n \times n$ positive definite matrix, and its derivatives - leading to prohibitive $\mathcal{O}(n^3)$ computations. We propose novel $\mathcal{O}(n)$ approaches to estimating these quantities from only fast matrix vector multiplications (MVMs). These stochastic approximations are based on Chebyshev, Lanczos, and surrogate models, and converge quickly even for kernel matrices that have challenging spectra. We leverage these approximations to develop a scalable Gaussian process approach to kernel learning. We find that Lanczos is generally superior to Chebyshev for kernel learning, and that a surrogate approach can be highly efficient and accurate with popular kernels.