Information Newton's flow: second-order optimization method in probability space
This work addresses optimization challenges in probability spaces for researchers in machine learning and statistics, offering a novel framework that is incremental in extending existing gradient flow concepts.
The paper tackles the problem of second-order optimization in probability space by introducing information Newton's flows, extending Newton's method to probability spaces using Fisher-Rao and Wasserstein-2 metrics, and demonstrates effectiveness in Bayesian sampling examples with convergence results and numerical implementations.
We introduce a framework for Newton's flows in probability space with information metrics, named information Newton's flows. Here two information metrics are considered, including both the Fisher-Rao metric and the Wasserstein-2 metric. A known fact is that overdamped Langevin dynamics correspond to Wasserstein gradient flows of Kullback-Leibler (KL) divergence. Extending this fact to Wasserstein Newton's flows, we derive Newton's Langevin dynamics. We provide examples of Newton's Langevin dynamics in both one-dimensional space and Gaussian families. For the numerical implementation, we design sampling efficient variational methods in affine models and reproducing kernel Hilbert space (RKHS) to approximate Wasserstein Newton's directions. We also establish convergence results of the proposed information Newton's method with approximated directions. Several numerical examples from Bayesian sampling problems are shown to demonstrate the effectiveness of the proposed method.