MLApr 19, 2025
Optimal Scheduling of Dynamic TransportPanos Tsimpos, Zhi Ren, Jakob Zech et al.
Flow-based methods for sampling and generative modeling use continuous-time dynamical systems to represent a {transport map} that pushes forward a source measure to a target measure. The introduction of a time axis provides considerable design freedom, and a central question is how to exploit this freedom. Though many popular methods seek straight line (i.e., zero acceleration) trajectories, we show here that a specific class of ``curved'' trajectories can significantly improve approximation and learning. In particular, we consider the unit-time interpolation of any given transport map $T$ and seek the schedule $τ: [0,1] \to [0,1]$ that minimizes the spatial Lipschitz constant of the corresponding velocity field over all times $t \in [0,1]$. This quantity is crucial as it allows for control of the approximation error when the velocity field is learned from data. We show that, for a broad class of source/target measures and transport maps $T$, the \emph{optimal schedule} can be computed in closed form, and that the resulting optimal Lipschitz constant is \emph{exponentially smaller} than that induced by an identity schedule (corresponding to, for instance, the Wasserstein geodesic). Our proof technique relies on the calculus of variations and $Γ$-convergence, allowing us to approximate the aforementioned degenerate objective by a family of smooth, tractable problems.
LGFeb 6, 2025
Distribution learning via neural differential equations: minimal energy regularization and approximation theoryYoussef Marzouk, Zhi Ren, Jakob Zech
Neural ordinary differential equations (ODEs) provide expressive representations of invertible transport maps that can be used to approximate complex probability distributions, e.g., for generative modeling, density estimation, and Bayesian inference. We show that for a large class of transport maps $T$, there exists a time-dependent ODE velocity field realizing a straight-line interpolation $(1-t)x + tT(x)$, $t \in [0,1]$, of the displacement induced by the map. Moreover, we show that such velocity fields are minimizers of a training objective containing a specific minimum-energy regularization. We then derive explicit upper bounds for the $C^k$ norm of the velocity field that are polynomial in the $C^k$ norm of the corresponding transport map $T$; in the case of triangular (Knothe--Rosenblatt) maps, we also show that these bounds are polynomial in the $C^k$ norms of the associated source and target densities. Combining these results with stability arguments for distribution approximation via ODEs, we show that Wasserstein or Kullback--Leibler approximation of the target distribution to any desired accuracy $ε> 0$ can be achieved by a deep neural network representation of the velocity field whose size is bounded explicitly in terms of $ε$, the dimension, and the smoothness of the source and target densities. The same neural network ansatz yields guarantees on the value of the regularized training objective.
STSep 3, 2023
Distribution learning via neural differential equations: a nonparametric statistical perspectiveYoussef Marzouk, Zhi Ren, Sven Wang et al.
Ordinary differential equations (ODEs), via their induced flow maps, provide a powerful framework to parameterize invertible transformations for the purpose of representing complex probability distributions. While such models have achieved enormous success in machine learning, particularly for generative modeling and density estimation, little is known about their statistical properties. This work establishes the first general nonparametric statistical convergence analysis for distribution learning via ODE models trained through likelihood maximization. We first prove a convergence theorem applicable to arbitrary velocity field classes $\mathcal{F}$ satisfying certain simple boundary constraints. This general result captures the trade-off between approximation error (`bias') and the complexity of the ODE model (`variance'). We show that the latter can be quantified via the $C^1$-metric entropy of the class $\mathcal F$. We then apply this general framework to the setting of $C^k$-smooth target densities, and establish nearly minimax-optimal convergence rates for two relevant velocity field classes $\mathcal F$: $C^k$ functions and neural networks. The latter is the practically important case of neural ODEs. Our proof techniques require a careful synthesis of (i) analytical stability results for ODEs, (ii) classical theory for sieved M-estimators, and (iii) recent results on approximation rates and metric entropies of neural network classes. The results also provide theoretical insight on how the choice of velocity field class, and the dependence of this choice on sample size $n$ (e.g., the scaling of width, depth, and sparsity of neural network classes), impacts statistical performance.
MLJun 18, 2019
Consistency of semi-supervised learning algorithms on graphs: Probit and one-hot methodsFranca Hoffmann, Bamdad Hosseini, Zhi Ren et al.
Graph-based semi-supervised learning is the problem of propagating labels from a small number of labelled data points to a larger set of unlabelled data. This paper is concerned with the consistency of optimization-based techniques for such problems, in the limit where the labels have small noise and the underlying unlabelled data is well clustered. We study graph-based probit for binary classification, and a natural generalization of this method to multi-class classification using one-hot encoding. The resulting objective function to be optimized comprises the sum of a quadratic form defined through a rational function of the graph Laplacian, involving only the unlabelled data, and a fidelity term involving only the labelled data. The consistency analysis sheds light on the choice of the rational function defining the optimization.