Variational Free Energy Pivot Selection for Pivoted Cholesky
For practitioners of Gaussian process regression and other kernel methods, this provides a principled pivot selection that directly targets the downstream objective rather than trace-norm error, leading to better low-rank approximations.
The authors propose a pivot selection rule for pivoted Cholesky that maximizes the exact one-step change of a variational free energy functional, achieving improved objective values and predictive accuracy at low to moderate ranks compared to classical and randomly pivoted Cholesky, while matching asymptotic complexity.
Pivoted Cholesky factorizations construct low-rank approximations of symmetric positive definite matrices by sequentially selecting pivots from the residual diagonal. Classical greedy and randomized rules, such as randomly pivoted Cholesky, target the algebraic trace-norm error of the residual. In many applications, however, the matrix enters a nonlinear matrix functional whose value, not the trace-norm error, determines solution quality, and residual-based rules ignore this structure. We derive a pivot rule that maximizes the exact one-step change of such a functional under Cholesky-consistent rank-1 updates, for a functional combining log-determinant, quadratic, and trace terms. This functional arises as the variational free energy in Gaussian process regression, where the matrix is a kernel matrix. The resulting per-step gain admits a closed-form additive decomposition into complexity, data-fit, and trace contributions, and is used directly as a pivot-selection criterion. We refer to the resulting method as $Δ$-VFE pivoted Cholesky. At each iteration, the criterion is evaluated on a batch of $s$ candidate pivots sampled proportionally to the residual diagonal via incremental Woodbury updates, at a total cost of $\mathcal{O}(snr^2)$ for an $n\times n$ matrix and target rank $r$. This matches the asymptotic complexity of randomly pivoted Cholesky up to the batch factor $s$. Cholesky-consistent rank-1 updates yield monotonically non-decreasing functional values, and the proposed rule maximizes the per-step gain among them. Numerical experiments show improved objective values and predictive accuracy at low to moderate ranks compared to classical and randomly pivoted Cholesky, while preserving trace-norm approximation quality.