Distribution-Free Distribution Regression
This provides a distribution-free framework for regression tasks where covariates are probability distributions, which is incremental as it relaxes assumptions but does not introduce a new paradigm.
The paper tackles the problem of distribution regression without assuming specific distributions for the error term or covariate, proving that the excess prediction risk converges to zero at a polynomial rate when the effective dimension is small.
`Distribution regression' refers to the situation where a response Y depends on a covariate P where P is a probability distribution. The model is Y=f(P) + mu where f is an unknown regression function and mu is a random error. Typically, we do not observe P directly, but rather, we observe a sample from P. In this paper we develop theory and methods for distribution-free versions of distribution regression. This means that we do not make distributional assumptions about the error term mu and covariate P. We prove that when the effective dimension is small enough (as measured by the doubling dimension), then the excess prediction risk converges to zero with a polynomial rate.