Mixture of Experts Soften the Curse of Dimensionality in Operator Learning
This addresses computational efficiency in operator learning for scientific computing, though it is incremental as it builds on existing neural operator frameworks.
The paper tackles the curse of dimensionality in operator learning by proposing a mixture of neural operators (MoNOs), achieving approximation of Lipschitz operators with each expert having depth, width, and rank scaling as O(ε⁻¹).
In this paper, we construct a mixture of neural operators (MoNOs) between function spaces whose complexity is distributed over a network of expert neural operators (NOs), with each NO satisfying parameter scaling restrictions. Our main result is a \textit{distributed} universal approximation theorem guaranteeing that any Lipschitz non-linear operator between $L^2([0,1]^d)$ spaces can be approximated uniformly over the Sobolev unit ball therein, to any given $\varepsilon>0$ accuracy, by an MoNO while satisfying the constraint that: each expert NO has a depth, width, and rank of $\mathcal{O}(\varepsilon^{-1})$. Naturally, our result implies that the required number of experts must be large, however, each NO is guaranteed to be small enough to be loadable into the active memory of most computers for reasonable accuracies $\varepsilon$. During our analysis, we also obtain new quantitative expression rates for classical NOs approximating uniformly continuous non-linear operators uniformly on compact subsets of $L^2([0,1]^d)$.