Forest Kobayashi, Jonathan Hayase, Young-Heon Kim
Given $m < n$, we consider the problem of ``best'' approximating an $n\text{-d}$ probability measure $ρ$ via an $m\text{-d}$ measure $ν$ such that $\mathrm{supp}\ ν$ has bounded total ``complexity.'' When $ρ$ is concentrated near an $m\text{-d}$ set we may interpret this as a manifold learning problem with noisy data. However, we do not restrict our analysis to this case, as the more general formulation has broader applications. We quantify $ν$'s performance in approximating $ρ$ via the Monge-Kantorovich (also called Wasserstein) $p$-cost $\mathbb{W}_p^p(ρ, ν)$, and constrain the complexity by requiring $\mathrm{supp}\ ν$ to be coverable by an $f : \mathbb{R}^{m} \to \mathbb{R}^{n}$ whose $W^{k,q}$ Sobolev norm is bounded by $\ell \geq 0$. This allows us to reformulate the problem as minimizing a functional $\mathscr J_p(f)$ under the Sobolev ``budget'' $\ell$. This problem is closely related to (but distinct from) principal curves with length constraints when $m=1, k = 1$ and an unsupervised analogue of smoothing splines when $k > 1$. New challenges arise from the higher-order differentiability condition. We study the ``gradient'' of $\mathscr J_p$, which is given by a certain vector field that we call the barycenter field, and use it to prove a nontrivial (almost) strict monotonicity result. We also provide a natural discretization scheme and establish its consistency. We use this scheme as a toy model for a generative learning task, and by analogy, propose novel interpretations for the role regularization plays in improving training.