A Gaussian Comparison Theorem for Training Dynamics in Machine Learning
This work provides theoretical insights into training dynamics for machine learning practitioners, though it appears incremental as it builds on existing comparison theorems and applies to specific algorithms and models.
The authors tackled the problem of analyzing training dynamics for algorithms using Gaussian mixture data by connecting model evolution to a surrogate dynamical system, enabling rigorous validation of dynamic mean-field expressions in asymptotic scenarios and proposing an iterative refinement for non-asymptotic cases.
We study training algorithms with data following a Gaussian mixture model. For a specific family of such algorithms, we present a non-asymptotic result, connecting the evolution of the model to a surrogate dynamical system, which can be easier to analyze. The proof of our result is based on the celebrated Gordon comparison theorem. Using our theorem, we rigorously prove the validity of the dynamic mean-field (DMF) expressions in the asymptotic scenarios. Moreover, we suggest an iterative refinement scheme to obtain more accurate expressions in non-asymptotic scenarios. We specialize our theory to the analysis of training a perceptron model with a generic first-order (full-batch) algorithm and demonstrate that fluctuation parameters in a non-asymptotic domain emerge in addition to the DMF kernels.