ML LG STNov 10, 2013

Fast Distribution To Real Regression

Junier B. Oliva, Willie Neiswanger, Barnabas Poczos, Jeff Schneider, Eric Xing

arXiv:1311.2236v245 citations

Originality Incremental advance

AI Analysis

This addresses a computational bottleneck in distribution regression for researchers and practitioners dealing with large datasets, offering an incremental improvement over prior work.

The paper tackles the problem of distribution-to-real regression, where predicting from distribution inputs is computationally expensive with existing methods, and proposes the Double-Basis estimator to reduce evaluation cost to be independent of dataset size while maintaining fast convergence rates.

We study the problem of distribution to real-value regression, where one aims to regress a mapping $f$ that takes in a distribution input covariate $P\in \mathcal{I}$ (for a non-parametric family of distributions $\mathcal{I}$) and outputs a real-valued response $Y=f(P) + ε$. This setting was recently studied, and a "Kernel-Kernel" estimator was introduced and shown to have a polynomial rate of convergence. However, evaluating a new prediction with the Kernel-Kernel estimator scales as $Ω(N)$. This causes the difficult situation where a large amount of data may be necessary for a low estimation risk, but the computation cost of estimation becomes infeasible when the data-set is too large. To this end, we propose the Double-Basis estimator, which looks to alleviate this big data problem in two ways: first, the Double-Basis estimator is shown to have a computation complexity that is independent of the number of of instances $N$ when evaluating new predictions after training; secondly, the Double-Basis estimator is shown to have a fast rate of convergence for a general class of mappings $f\in\mathcal{F}$.

View on arXiv PDF

Similar