MLLGSep 6, 2018

Gaussian Process Regression for Binned Data

arXiv:1809.02010v28 citations
AI Analysis

This addresses a common data representation issue in statistics and machine learning for researchers and practitioners working with binned datasets, though it is incremental as it adapts existing GP methods.

The paper tackles the problem of performing regression on binned data, which typically suffers from inaccurate bin height readings or interpolation errors, by proposing a Gaussian Process regression method that makes probabilistic predictions of the latent function, resulting in a more precise density for predictions.

Many datasets are in the form of tables of binned data. Performing regression on these data usually involves either reading off bin heights, ignoring data from neighbouring bins or interpolating between bins thus over or underestimating the true bin integrals. In this paper we propose an elegant method for performing Gaussian Process (GP) regression given such binned data, allowing one to make probabilistic predictions of the latent function which produced the binned data. We look at several applications. First, for differentially private regression; second, to make predictions over other integrals; and third when the input regions are irregularly shaped collections of polytopes. In summary, our method provides an effective way of analysing binned data such that one can use more information from the histogram representation, and thus reconstruct a more useful and precise density for making predictions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes