MLLGJun 22, 2019

Multi-task Learning for Aggregated Data using Gaussian Processes

arXiv:1906.09412v441 citations
Originality Incremental advance
AI Analysis

This work addresses a common issue in fields like epidemiology and demography where data is aggregated, offering a scalable solution for multi-task learning with different likelihoods.

The authors tackled the problem of learning from aggregated data at different scales by developing a multi-task Gaussian process model that integrates latent processes per task and computes cross-covariances analytically or numerically. They demonstrated the model's effectiveness on synthetic, fertility, and air pollution datasets, making it scalable with variational optimization.

Aggregated data is commonplace in areas such as epidemiology and demography. For example, census data for a population is usually given as averages defined over time periods or spatial resolutions (cities, regions or countries). In this paper, we present a novel multi-task learning model based on Gaussian processes for joint learning of variables that have been aggregated at different input scales. Our model represents each task as the linear combination of the realizations of latent processes that are integrated at a different scale per task. We are then able to compute the cross-covariance between the different tasks either analytically or numerically. We also allow each task to have a potentially different likelihood model and provide a variational lower bound that can be optimised in a stochastic fashion making our model suitable for larger datasets. We show examples of the model in a synthetic example, a fertility dataset, and an air pollution prediction application.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes