Optimal Neural Network Approximation of Wasserstein Gradient Direction via Convex Optimization
This work addresses a computational bottleneck in Wasserstein gradient approximation for researchers in machine learning and scientific computing, though it is incremental as it builds on existing variational methods with a specific relaxation.
The paper tackles the problem of approximating the Wasserstein gradient direction, which is crucial for posterior sampling and scientific computing, by deriving a convex semi-definite programming relaxation for two-layer networks with squared-ReLU activations, achieving optimal approximation in this function class and demonstrating effectiveness in PDE-constrained Bayesian inference and COVID-19 modeling.
The computation of Wasserstein gradient direction is essential for posterior sampling problems and scientific computing. The approximation of the Wasserstein gradient with finite samples requires solving a variational problem. We study the variational problem in the family of two-layer networks with squared-ReLU activations, towards which we derive a semi-definite programming (SDP) relaxation. This SDP can be viewed as an approximation of the Wasserstein gradient in a broader function family including two-layer networks. By solving the convex SDP, we obtain the optimal approximation of the Wasserstein gradient direction in this class of functions. Numerical experiments including PDE-constrained Bayesian inference and parameter estimation in COVID-19 modeling demonstrate the effectiveness of the proposed method.