Predicting with Distributions
This introduces a new learning paradigm that could impact machine learning theory by generalizing standard models, though it appears primarily theoretical without immediate practical applications.
The paper tackles the problem of learning from distributions rather than individual data points, where an unknown function maps inputs to entire output distributions. The main result shows that virtually every combination of PAC learning algorithms for function and distribution classes yields an efficient algorithm in this model through general reductions.
We consider a new learning model in which a joint distribution over vector pairs $(x,y)$ is determined by an unknown function $c(x)$ that maps input vectors $x$ not to individual outputs, but to entire {\em distributions\/} over output vectors $y$. Our main results take the form of rather general reductions from our model to algorithms for PAC learning the function class and the distribution class separately, and show that virtually every such combination yields an efficient algorithm in our model. Our methods include a randomized reduction to classification noise and an application of Le Cam's method to obtain robust learning algorithms.