Generalization Bounds and Statistical Guarantees for Multi-Task and Multiple Operator Learning with MNO Networks
This provides theoretical guarantees for multi-task and multiple operator learning, addressing a gap in statistical generalization for hierarchical data structures in operator learning, which is incremental but important for applications like PDE foundation models.
The paper tackles the problem of providing statistical generalization guarantees for multiple operator learning, where data is collected hierarchically across operator instances, input functions, and evaluation points. The authors derive explicit metric-entropy bounds for separable models like Multiple Neural Operator (MNO) networks and combine these with approximation guarantees to obtain an explicit approximation-estimation tradeoff for expected test error, making dependence on hierarchical sampling budgets transparent and yielding a sample-complexity characterization for generalization across operator instances.
Multiple operator learning concerns learning operator families $\{G[α]:U\to V\}_{α\in W}$ indexed by an operator descriptor $α$. Training data are collected hierarchically by sampling operator instances $α$, then input functions $u$ per instance, and finally evaluation points $x$ per input, yielding noisy observations of $G[α][u](x)$. While recent work has developed expressive multi-task and multiple operator learning architectures and approximation-theoretic scaling laws, quantitative statistical generalization guarantees remain limited. We provide a covering-number-based generalization analysis for separable models, focusing on the Multiple Neural Operator (MNO) architecture: we first derive explicit metric-entropy bounds for hypothesis classes given by linear combinations of products of deep ReLU subnetworks, and then combine these complexity bounds with approximation guarantees for MNO to obtain an explicit approximation-estimation tradeoff for the expected test error on new (unseen) triples $(α,u,x)$. The resulting bound makes the dependence on the hierarchical sampling budgets $(n_α,n_u,n_x)$ transparent and yields an explicit learning-rate statement in the operator-sampling budget $n_α$, providing a sample-complexity characterization for generalization across operator instances. The structure and architecture can also be viewed as a general purpose solver or an example of a "small'' PDE foundation model, where the triples are one form of multi-modality.