NAMay 30
Solver-in-the-Loop joint operator learning: fractional Laplace-Beltrami features for interface reconstructionYangyang Zheng, Huayi Wei, Shuhao Cao et al.
In this work, we propose a joint operator learning method for reconstructing images of conductivity coefficients from boundary data. Inspired by the idea of employing partial differential equation (PDE) solvers as preconditioners for this inverse problem, we investigate a ``solver-in-the-loop'' training mechanism. It allows the interaction of learnable parameters integrated in a PDE solver module and those in neural networks for reconstructing images. Specifically, we employ a fractional Laplace-Beltrami operator with a learnable fractional order, which transforms boundary data into high-dimensional features. These features then serve as input to a neural network, significantly improving reconstruction accuracy. For this purpose, a Learning-Automated FEM (LA-FEM) package, facilitating this ``solver-in-the-loop'' property, is developed with PyTorch as a backend. The new LA-FEM module conveniently allows the auto-differentiation regarding an objective function to freely propagate through the PDE solver from the forward problem and the coupled neural networks for the inverse problem.
NAOct 23, 2018
Anisotropic Error Estimates of The Linear Virtual Element Method on Polygonal MeshesShuhao Cao, Long Chen
A refined a priori error analysis of the lowest order (linear) Virtual Element Method (VEM) is developed for approximating a model two dimensional Poisson problem. A set of new geometric assumptions is proposed on shape regularity of polygonal meshes. A new universal error equation for the lowest order (linear) VEM is derived for any choice of stabilization, and a new stabilization using broken half-seminorm is introduced to incorporate short edges naturally into the a priori error analysis on isotropic elements. The error analysis is then extended to a special class of anisotropic elements with high aspect ratio originating from a body-fitted mesh generator, which uses straight lines to cut a shape regular background mesh. Lastly, some commonly used tools for triangular elements are revisited for polygonal elements to give an in-depth view of these estimates' dependence on shapes.
NADec 29, 2018
Anisotropic Error Estimates of The Linear Nonconforming Virtual Element MethodsShuhao Cao, Long Chen
A refined a priori error analysis of the lowest order (linear) nonconforming Virtual Element Method (VEM) for approximating a model Poisson problem is developed in both 2D and 3D. A set of new geometric assumptions is proposed on shape regularity of polytopal meshes. A new error equation for the lowest order (linear) nonconforming VEM is derived for any choice of stabilization, and a new stabilization using a projection on an extended element patch is introduced for the error analysis on anisotropic elements.
NAJun 7, 2016
Robust A Posteriori Error Estimation for Finite Element Approximation to H(curl) ProblemZhiqiang Cai, Shuhao Cao, Rob Falgout
In this paper, we introduce a novel a posteriori error estimator for the conforming finite element approximation to the H(curl) problem with inhomogeneous media and with the right-hand side only in L^2. The estimator is of the recovery type. Independent with the current approximation to the primary variable (the electric field), an auxiliary variable (the magnetizing field) is recovered in parallel by solving a similar H(curl) problem. An alternate way of recovery is presented as well by localizing the error flux. The estimator is then defined as the sum of the modified element residual and the residual of the constitutive equation defining the auxiliary variable. It is proved that the estimator is approximately equal to the true error in the energy norm without the quasi-monotonicity assumption. Finally, we present numerical results for two H(curl) interface problems.
LGSep 29, 2022
Transformer Meets Boundary Value Inverse ProblemsRuchi Guo, Shuhao Cao, Long Chen
A Transformer-based deep direct sampling method is proposed for electrical impedance tomography, a well-known severely ill-posed nonlinear boundary value inverse problem. A real-time reconstruction is achieved by evaluating the learned inverse operator between carefully designed data and the reconstructed images. An effort is made to give a specific example to a fundamental question: whether and how one can benefit from the theoretical structure of a mathematical problem to develop task-oriented and structure-conforming deep neural networks? Specifically, inspired by direct sampling methods for inverse problems, the 1D boundary data in different frequencies are preprocessed by a partial differential equation-based feature map to yield 2D harmonic extensions as different input channels. Then, by introducing learnable non-local kernels, the direct sampling is recast to a modified attention mechanism. The new method achieves superior accuracy over its predecessors and contemporary operator learners and shows robustness to noises in benchmarks. This research shall strengthen the insights that, despite being invented for natural language processing tasks, the attention mechanism offers great flexibility to be modified in conformity with the a priori mathematical knowledge, which ultimately leads to the design of more physics-compatible neural architectures.
LGOct 19, 2022
Mitigating spectral bias for the multiscale operator learningXinliang Liu, Bo Xu, Shuhao Cao et al.
Neural operators have emerged as a powerful tool for learning the mapping between infinite-dimensional parameter and solution spaces of partial differential equations (PDEs). In this work, we focus on multiscale PDEs that have important applications such as reservoir modeling and turbulence prediction. We demonstrate that for such PDEs, the spectral bias towards low-frequency components presents a significant challenge for existing neural operators. To address this challenge, we propose a hierarchical attention neural operator (HANO) inspired by the hierarchical matrix approach. HANO features a scale-adaptive interaction range and self-attentions over a hierarchy of levels, enabling nested feature computation with controllable linear cost and encoding/decoding of multiscale solution space. We also incorporate an empirical $H^1$ loss function to enhance the learning of high-frequency components. Our numerical experiments demonstrate that HANO outperforms state-of-the-art (SOTA) methods for representative multiscale problems.
LGMay 26, 2025
Advanced Long-term Earth System ForecastingHao Wu, Yuan Gao, Ruijian Gou et al.
Reliable long-term forecasting of Earth system dynamics is fundamentally limited by instabilities in current artificial intelligence (AI) models during extended autoregressive simulations. These failures often originate from inherent spectral bias, leading to inadequate representation of critical high-frequency, small-scale processes and subsequent uncontrolled error amplification. Inspired by the nested grids in numerical models used to resolve small scales, we present TritonCast. At the core of its design is a dedicated latent dynamical core, which ensures the long-term stability of the macro-evolution at a coarse scale. An outer structure then fuses this stable trend with fine-grained local details. This design effectively mitigates the spectral bias caused by cross-scale interactions. In atmospheric science, it achieves state-of-the-art accuracy on the WeatherBench 2 benchmark while demonstrating exceptional long-term stability: executing year-long autoregressive global forecasts and completing multi-year climate simulations that span the entire available $2500$-day test period without drift. In oceanography, it extends skillful eddy forecast to $120$ days and exhibits unprecedented zero-shot cross-resolution generalization. Ablation studies reveal that this performance stems from the synergistic interplay of the architecture's core components. TritonCast thus offers a promising pathway towards a new generation of trustworthy, AI-driven simulations. This significant advance has the potential to accelerate discovery in climate and Earth system science, enabling more reliable long-term forecasting and deeper insights into complex geophysical dynamics.
CVFeb 8, 2022
How to Understand Masked AutoencodersShuhao Cao, Peng Xu, David A. Clifton
"Masked Autoencoders (MAE) Are Scalable Vision Learners" revolutionizes the self-supervised learning method in that it not only achieves the state-of-the-art for image pre-training, but is also a milestone that bridges the gap between visual and linguistic masked autoencoding (BERT-style) pre-trainings. However, to our knowledge, to date there are no theoretical perspectives to explain the powerful expressivity of MAE. In this paper, we, for the first time, propose a unified theoretical framework that provides a mathematical understanding for MAE. Specifically, we explain the patch-based attention approaches of MAE using an integral kernel under a non-overlapping domain decomposition setting. To help the research community to further comprehend the main reasons of the great success of MAE, based on our framework, we pose five questions and answer them with mathematical rigor using insights from operator theory.
LGSep 21, 2021
Neural networks with trainable matrix activation functionsZhengqi Liu, Shuhao Cao, Yuwen Li et al.
The training process of neural networks usually optimize weights and bias parameters of linear transformations, while nonlinear activation functions are pre-specified and fixed. This work develops a systematic approach to constructing matrix-valued activation functions whose entries are generalized from ReLU. The activation is based on matrix-vector multiplications using only scalar multiplications and comparisons. The proposed activation functions depend on parameters that are trained along with the weights and bias vectors. Neural networks based on this approach are simple and efficient and are shown to be robust in numerical experiments.
LGMay 31, 2021
Choose a Transformer: Fourier or GalerkinShuhao Cao
In this paper, we apply the self-attention from the state-of-the-art Transformer in Attention Is All You Need for the first time to a data-driven operator learning problem related to partial differential equations. An effort is put together to explain the heuristics of, and to improve the efficacy of the attention mechanism. By employing the operator approximation theory in Hilbert spaces, it is demonstrated for the first time that the softmax normalization in the scaled dot-product attention is sufficient but not necessary. Without softmax, the approximation capacity of a linearized Transformer variant can be proved to be comparable to a Petrov-Galerkin projection layer-wise, and the estimate is independent with respect to the sequence length. A new layer normalization scheme mimicking the Petrov-Galerkin projection is proposed to allow a scaling to propagate through attention layers, which helps the model achieve remarkable accuracy in operator learning tasks with unnormalized data. Finally, we present three operator learning experiments, including the viscid Burgers' equation, an interface Darcy flow, and an inverse interface coefficient identification problem. The newly proposed simple attention-based operator learner, Galerkin Transformer, shows significant improvements in both training cost and evaluation accuracy over its softmax-normalized counterparts.
NAOct 1, 2018
A note on the error estimate of the virtual element methodsShuhao Cao, Long Chen, Frank Lin
This short note reports a new derivation of the optimal order of the a priori error estimates for conforming virtual element methods (VEM) on 3D polyhedral meshes based on an error equation. The geometric assumptions, which are necessary for the optimal order of the conforming VEM error estimate in the $H^1$-seminorm, are relaxed for that in a bilinear form-induced energy norm.
NASep 8, 2015
A Recovery-Based A Posteriori Error Estimator for H(curl) Interface ProblemsZhiqiang Cai, Shuhao Cao
This paper introduces a new recovery-based a posteriori error estimator for the lowest order Nedelec finite element approximation to the H(curl) interface problem. The error estimator is analyzed by establishing both the reliability and the efficiency bounds and is supported by numerical results. Under certain assumptions, it is proved that the reliability and efficiency constants are independent of the jumps of the coefficients.