Recommendations for Baselines and Benchmarking Approximate Gaussian Processes
This work addresses the problem of inconsistent benchmarking for ML researchers using approximate Gaussian processes, though it is incremental in nature.
The paper tackles the lack of clear recommendations for comparing Gaussian process approximations by proposing a specification for user expectations and developing a tuning-free training procedure for a variational method, showing it serves as a strong baseline.
Gaussian processes (GPs) are a mature and widely-used component of the ML toolbox. One of their desirable qualities is automatic hyperparameter selection, which allows for training without user intervention. However, in many realistic settings, approximations are typically needed, which typically do require tuning. We argue that this requirement for tuning complicates evaluation, which has led to a lack of a clear recommendations on which method should be used in which situation. To address this, we make recommendations for comparing GP approximations based on a specification of what a user should expect from a method. In addition, we develop a training procedure for the variational method of Titsias [2009] that leaves no choices to the user, and show that this is a strong baseline that meets our specification. We conclude that benchmarking according to our suggestions gives a clearer view of the current state of the field, and uncovers problems that are still open that future papers should address.