OC LG NA PR MLSep 25, 2024

Non-asymptotic convergence analysis of the stochastic gradient Hamiltonian Monte Carlo algorithm with discontinuous stochastic gradient with applications to training of ReLU neural networks

arXiv:2409.17107v25.61 citationsh-index: 2Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of training neural networks with ReLU activations, which involve discontinuous gradients, by extending SGHMC analysis, but it is incremental as it builds on existing SGHMC methods with a focus on specific applications.

The paper tackled the problem of analyzing the convergence of the stochastic gradient Hamiltonian Monte Carlo (SGHMC) algorithm for non-convex stochastic optimization with discontinuous stochastic gradients, such as in training ReLU neural networks, by providing explicit upper bounds on the expected excess risk that can be made arbitrarily small.

In this paper, we provide a non-asymptotic analysis of the convergence of the stochastic gradient Hamiltonian Monte Carlo (SGHMC) algorithm to a target measure in Wasserstein-1 and Wasserstein-2 distance. Crucially, compared to the existing literature on SGHMC, we allow its stochastic gradient to be discontinuous. This allows us to provide explicit upper bounds, which can be controlled to be arbitrarily small, for the expected excess risk of non-convex stochastic optimization problems with discontinuous stochastic gradients, including, among others, the training of neural networks with ReLU activation function. To illustrate the applicability of our main results, we consider numerical experiments on quantile estimation and on several optimization problems involving ReLU neural networks relevant in finance and artificial intelligence.

View on arXiv PDF Code

Similar