OCJan 14, 2013
On the Multi-Dimensional Controller and Stopper GamesErhan Bayraktar, Yu-Jui Huang
We consider a zero-sum stochastic differential controller-and-stopper game in which the state process is a controlled diffusion evolving in a multi-dimensional Euclidean space. In this game, the controller affects both the drift and the volatility terms of the state process. Under appropriate conditions, we show that the game has a value and the value function is the unique viscosity solution to an obstacle problem for a Hamilton-Jacobi-Bellman equation.
LGMay 5, 2022
GANs as Gradient Flows that ConvergeYu-Jui Huang, Yuchong Zhang
This paper approaches the unsupervised learning problem by gradient descent in the space of probability density functions. A main result shows that along the gradient flow induced by a distribution-dependent ordinary differential equation (ODE), the unknown data distribution emerges as the long-time limit. That is, one can uncover the data distribution by simulating the distribution-dependent ODE. Intriguingly, the simulation of the ODE is shown equivalent to the training of generative adversarial networks (GANs). This equivalence provides a new "cooperative" view of GANs and, more importantly, sheds new light on the divergence of GANs. In particular, it reveals that the GAN algorithm implicitly minimizes the mean squared error (MSE) between two sets of samples, and this MSE fitting alone can cause GANs to diverge. To construct a solution to the distribution-dependent ODE, we first show that the associated nonlinear Fokker-Planck equation has a unique weak solution, by the Crandall-Liggett theorem for differential equations in Banach spaces. Based on this solution to the Fokker-Planck equation, we construct a unique solution to the ODE, using Trevisan's superposition principle. The convergence of the induced gradient flow to the data distribution is obtained by analyzing the Fokker-Planck equation.
LGSep 8, 2025
On optimal solutions of classical and sliced Wasserstein GANs with non-Gaussian dataYu-Jui Huang, Hsin-Hua Shen, Yu-Chih Huang et al.
The generative adversarial network (GAN) aims to approximate an unknown distribution via a parameterized neural network (NN). While GANs have been widely applied in reinforcement and semisupervised learning as well as computer vision tasks, selecting their parameters often needs an exhaustive search and only a few selection methods can be proved to be theoretically optimal. One of the most promising GAN variants is the Wasserstein GAN (WGAN). Prior work on optimal parameters for WGAN is limited to the linear-quadratic-Gaussian (LQG) setting, where the NN is linear and the data is Gaussian. In this paper, we focus on the characterization of optimal WGAN parameters beyond the LQG setting. We derive closed-form optimal parameters for one-dimensional WGANs when the NN has non-linear activation functions and the data is non-Gaussian. To extend this to high-dimensional WGANs, we adopt the sliced Wasserstein framework and replace the constraint on marginal distributions of the randomly projected data by a constraint on the joint distribution of the original (unprojected) data. We show that the linear generator can be asymptotically optimal for sliced WGAN with non-Gaussian data. Empirical studies show that our closed-form WGAN parameters have good convergence behavior with data under both Gaussian and Laplace distributions. Also, compared to the r principal component analysis (r-PCA) solution, our proposed solution for sliced WGAN can achieve the same performance while requiring less computational resources.
OCJul 28, 2025
Mean-Field Langevin Diffusions with Density-dependent TemperatureYu-Jui Huang, Zachariah Malik
In the context of non-convex optimization, we let the temperature of a Langevin diffusion to depend on the diffusion's own density function. The rationale is that the induced density captures to some extent the landscape imposed by the non-convex function to be minimized, such that a density-dependent temperature provides location-wise random perturbation that may better react to, for instance, the location and depth of local minimizers. As the Langevin dynamics is now self-regulated by its own density, it forms a mean-field stochastic differential equation (SDE) of the Nemytskii type, distinct from the standard McKean-Vlasov equations. Relying on Wasserstein subdifferential calculus, we first show that the corresponding (nonlinear) Fokker-Planck equation has a unique solution. Next, a weak solution to the SDE is constructed from the solution to the Fokker-Planck equation, by Trevisan's superposition principle. As time goes to infinity, we further show that the induced density converges to an invariant distribution, which admits an explicit formula in terms of the Lambert $W$ function. A numerical example suggests that the density-dependent temperature can simultaneously improve the accuracy of and rate of convergence to the estimate of global minimizers.
MLJun 19, 2024
Generative Modeling by Minimizing the Wasserstein-2 LossYu-Jui Huang, Zachariah Malik
This paper approaches the unsupervised learning problem by minimizing the second-order Wasserstein loss (the $W_2$ loss) through a distribution-dependent ordinary differential equation (ODE), whose dynamics involves the Kantorovich potential associated with the true data distribution and a current estimate of it. A main result shows that the time-marginal laws of the ODE form a gradient flow for the $W_2$ loss, which converges exponentially to the true data distribution. An Euler scheme for the ODE is proposed and it is shown to recover the gradient flow for the $W_2$ loss in the limit. An algorithm is designed by following the scheme and applying persistent training, which naturally fits our gradient-flow approach. In both low- and high-dimensional experiments, our algorithm outperforms Wasserstein generative adversarial networks by increasing the level of persistent training appropriately.