MATH-PHLGPRJan 24, 2025

Mean-field limit from general mixtures of experts to quantum neural networks

arXiv:2501.14660v1h-index: 3
Originality Incremental advance
AI Analysis

This work provides theoretical insights into the scaling behavior of MoE models, which is incremental for researchers in machine learning theory and quantum computing.

The authors studied the asymptotic behavior of Mixture of Experts (MoE) trained via gradient flow, showing that as the number of experts diverges, the empirical measure of parameters converges to a probability measure solving a nonlinear continuity equation, with an explicit convergence rate depending on the number of experts, and applied this to a MoE generated by a quantum neural network.

In this work, we study the asymptotic behavior of Mixture of Experts (MoE) trained via gradient flow on supervised learning problems. Our main result establishes the propagation of chaos for a MoE as the number of experts diverges. We demonstrate that the corresponding empirical measure of their parameters is close to a probability measure that solves a nonlinear continuity equation, and we provide an explicit convergence rate that depends solely on the number of experts. We apply our results to a MoE generated by a quantum neural network.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes