Variational Learning on Aggregate Outputs with Gaussian Processes
This addresses a critical issue in applications like global disease mapping where data granularity mismatches hinder accurate predictions, offering a scalable solution with explicit uncertainty handling.
The paper tackles the problem of supervised learning when outputs are aggregated at a coarser level than inputs, such as in disease mapping, by proposing a variational learning approach with Gaussian processes and new bounds to handle intractability. It achieves improved prediction accuracy and scalability, demonstrated on malaria incidence modeling with over 1 million observations.
While a typical supervised learning framework assumes that the inputs and the outputs are measured at the same levels of granularity, many applications, including global mapping of disease, only have access to outputs at a much coarser level than that of the inputs. Aggregation of outputs makes generalization to new inputs much more difficult. We consider an approach to this problem based on variational learning with a model of output aggregation and Gaussian processes, where aggregation leads to intractability of the standard evidence lower bounds. We propose new bounds and tractable approximations, leading to improved prediction accuracy and scalability to large datasets, while explicitly taking uncertainty into account. We develop a framework which extends to several types of likelihoods, including the Poisson model for aggregated count data. We apply our framework to a challenging and important problem, the fine-scale spatial modelling of malaria incidence, with over 1 million observations.