Direct loss minimization algorithms for sparse Gaussian processes
This work addresses optimization bottlenecks in sparse Gaussian processes for machine learning practitioners, offering incremental technical advances in gradient estimation methods.
The paper tackles the challenge of applying Direct Loss Minimization (DLM) to sparse Gaussian processes, particularly in non-conjugate cases where gradient estimation is difficult, by proposing unbiased product sampling (uPS) and analyzing biased Monte Carlo (bMC) methods, resulting in significant performance improvements, such as enhanced sample efficiency and better trade-offs in convergence time and computational efficiency.
The paper provides a thorough investigation of Direct loss minimization (DLM), which optimizes the posterior to minimize predictive loss, in sparse Gaussian processes. For the conjugate case, we consider DLM for log-loss and DLM for square loss showing a significant performance improvement in both cases. The application of DLM in non-conjugate cases is more complex because the logarithm of expectation in the log-loss DLM objective is often intractable and simple sampling leads to biased estimates of gradients. The paper makes two technical contributions to address this. First, a new method using product sampling is proposed, which gives unbiased estimates of gradients (uPS) for the objective function. Second, a theoretical analysis of biased Monte Carlo estimates (bMC) shows that stochastic gradient descent converges despite the biased gradients. Experiments demonstrate empirical success of DLM. A comparison of the sampling methods shows that, while uPS is potentially more sample-efficient, bMC provides a better tradeoff in terms of convergence time and computational efficiency.