LGMay 20, 2024

Ensemble and Mixture-of-Experts DeepONets For Operator Learning

arXiv:2405.11907v511.58 citationsh-index: 3Has CodeTrans. Mach. Learn. Res.

Originality Incremental advance

AI Analysis

This work addresses the need for more accurate and efficient operator learning models in scientific machine learning, particularly for solving partial differential equations, though it is incremental as it builds on existing DeepONet frameworks.

The authors tackled the problem of improving expressivity and generalization in operator learning by introducing ensemble and mixture-of-experts DeepONet architectures, achieving 2-4x lower relative ℓ2 errors compared to standard methods on PDE problems in 2D and 3D.

We present a novel deep operator network (DeepONet) architecture for operator learning, the ensemble DeepONet, that allows for enriching the trunk network of a single DeepONet with multiple distinct trunk networks. This trunk enrichment allows for greater expressivity and generalization capabilities over a range of operator learning problems. We also present a spatial mixture-of-experts (MoE) DeepONet trunk network architecture that utilizes a partition-of-unity (PoU) approximation to promote spatial locality and model sparsity in the operator learning problem. We first prove that both the ensemble and PoU-MoE DeepONets are universal approximators. We then demonstrate that ensemble DeepONets containing a trunk ensemble of a standard trunk, the PoU-MoE trunk, and/or a proper orthogonal decomposition (POD) trunk can achieve 2-4x lower relative $\ell_2$ errors than standard DeepONets and POD-DeepONets on both standard and challenging new operator learning problems involving partial differential equations (PDEs) in two and three dimensions. Our new PoU-MoE formulation provides a natural way to incorporate spatial locality and model sparsity into any neural network architecture, while our new ensemble DeepONet provides a powerful and general framework for incorporating basis enrichment in scientific machine learning architectures for operator learning.

View on arXiv PDF Code

Similar