MATH-PH LG PRJan 24, 2025

Mean-field limit from general mixtures of experts to quantum neural networks

Anderson Melchor Hernandez, Davide Pastorello, Giacomo De Palma

arXiv:2501.14660v11.2h-index: 3

Originality Incremental advance

AI Analysis

This work provides theoretical insights into the scaling behavior of MoE models, which is incremental for researchers in machine learning theory and quantum computing.

The authors studied the asymptotic behavior of Mixture of Experts (MoE) trained via gradient flow, showing that as the number of experts diverges, the empirical measure of parameters converges to a probability measure solving a nonlinear continuity equation, with an explicit convergence rate depending on the number of experts, and applied this to a MoE generated by a quantum neural network.

In this work, we study the asymptotic behavior of Mixture of Experts (MoE) trained via gradient flow on supervised learning problems. Our main result establishes the propagation of chaos for a MoE as the number of experts diverges. We demonstrate that the corresponding empirical measure of their parameters is close to a probability measure that solves a nonlinear continuity equation, and we provide an explicit convergence rate that depends solely on the number of experts. We apply our results to a MoE generated by a quantum neural network.

View on arXiv PDF

Similar