CLSep 21, 2023Code
AceGPT, Localizing Large Language Models in ArabicHuang Huang, Fei Yu, Jianqing Zhu et al.
This paper is devoted to the development of a localized Large Language Model (LLM) specifically for Arabic, a language imbued with unique cultural characteristics inadequately addressed by current mainstream models. Significant concerns emerge when addressing cultural sensitivity and local values. To address this, the paper proposes a comprehensive solution that includes further pre-training with Arabic texts, Supervised Fine-Tuning (SFT) utilizing native Arabic instructions, and GPT-4 responses in Arabic, alongside Reinforcement Learning with AI Feedback (RLAIF) employing a reward model attuned to local culture and values. The goal is to cultivate culturally cognizant and value-aligned Arabic LLMs capable of accommodating the diverse, application-specific needs of Arabic-speaking communities. Comprehensive evaluations reveal that the resulting model, dubbed `AceGPT', sets the state-of-the-art standard for open Arabic LLMs across various benchmarks. Codes, data, and models are in https://github.com/FreedomIntelligence/AceGPT.
NAJul 25, 2018
ReLU Deep Neural Networks and Linear Finite ElementsJuncai He, Lin Li, Jinchao Xu et al.
In this paper, we investigate the relationship between deep neural networks (DNN) with rectified linear unit (ReLU) function as the activation function and continuous piecewise linear (CPWL) functions, especially CPWL functions from the simplicial linear finite element method (FEM). We first consider the special case of FEM. By exploring the DNN representation of its nodal basis functions, we present a ReLU DNN representation of CPWL in FEM. We theoretically establish that at least $2$ hidden layers are needed in a ReLU DNN to represent any linear finite element functions in $Ω\subseteq \mathbb{R}^d$ when $d\ge2$. Consequently, for $d=2,3$ which are often encountered in scientific and engineering computing, the minimal number of two hidden layers are necessary and sufficient for any CPWL function to be represented by a ReLU DNN. Then we include a detailed account on how a general CPWL in $\mathbb R^d$ can be represented by a ReLU DNN with at most $\lceil\log_2(d+1)\rceil$ hidden layers and we also give an estimation of the number of neurons in DNN that are needed in such a representation. Furthermore, using the relationship between DNN and FEM, we theoretically argue that a special class of DNN models with low bit-width are still expected to have an adequate representation power in applications. Finally, as a proof of concept, we present some numerical results for using ReLU DNNs to solve a two point boundary problem to demonstrate the potential of applying DNN for numerical solution of partial differential equations.
NANov 10, 2016
Algebraic Multigrid MethodsJinchao Xu, Ludmil T Zikatanov
This paper is to give an overview of AMG methods for solving large scale systems of equations such as those from the discretization of partial differential equations. AMG is often understood as the acronym of "Algebraic Multi-Grid", but it can also be understood as "Abstract Muti-Grid". Indeed, as it demonstrates in this paper, how and why an algebraic multigrid method can be better understood in a more abstract level. In the literature, there are a variety of different algebraic multigrid methods that have been developed from different perspectives. In this paper, we try to develop a unified framework and theory that can be used to derive and analyze different algebraic multigrid methods in a coherent manner. Given a smoother $R$ for a matrix $A$, such as Gauss-Seidel or Jacobi, we prove that the optimal coarse space of dimension $n_c$ is the span of the eigen-vectors corresponding to the first $n_c$ eigenvalues of $\bar RA$ (with $\bar R=R+R^T-R^TAR$). We also prove that this optimal coarse space can be obtained by a constrained trace-minimization problem for a matrix associated with $\bar RA$ and demonstrate that coarse spaces of most of existing AMG methods can be viewed some approximate solution of this trace-minimization problem. Furthermore, we provide a general approach to the construction of a quasi-optimal coarse space and we prove that under appropriate assumptions the resulting two-level AMG method for the underlying linear system converges uniformly with respect to the size of the problem, the coefficient variation, and the anisotropy. Our theory applies to most existing multigrid methods, including the standard geometric multigrid method, the classic AMG, energy-minimization AMG, unsmoothed and smoothed aggregation AMG, and spectral AMGe.
NAApr 21, 2016
High-Order Extended Finite Element Methods for Solving Interface ProblemsFei Wang, Yuanming Xiao, Jinchao Xu
In this paper, we study arbitrary order extended finite element (XFE) methods based on two discontinuous Galerkin (DG) schemes in order to solve elliptic interface problems in two and three dimensions. Optimal error estimates in the piecewise $H^1$-norm and in the $L^2$-norm are rigorously proved for both schemes. In particular, we have devised a new parameter-friendly DG-XFEM method, which means that no "sufficiently large" parameters are needed to ensure the optimal convergence of the scheme. To prove the stability of bilinear forms, we derive non-standard trace and inverse inequalities for high-order polynomials on curved sub-elements divided by the interface. All the estimates are independent of the location of the interface relative to the meshes. Numerical examples are given to support the theoretical results.
NAApr 24, 2018
A Unified Study of Continuous and Discontinuous Galerkin MethodsQingguo Hong, Fei Wang, Shuonan Wu et al.
A unified study is presented in this paper for the design and analysis of different finite element methods (FEMs), including conforming and nonconforming FEMs, mixed FEMs, hybrid FEMs,discontinuous Galerkin (DG) methods, hybrid discontinuous Galerkin (HDG) methods and weak Galerkin (WG) methods. Both HDG and WG are shown to admit inf-sup conditions that hold uniformly with respect to both mesh and penalization parameters. In addition, by taking the limit of the stabilization parameters, a WG method is shown to converge to a mixed method whereas an HDG method is shown to converge to a primal method. Furthermore, a special class of DG methods, known as the mixed DG methods, is presented to fill a gap revealed in the unified framework.
NAOct 4, 2014
Stable Finite Element Methods Preserving $\nabla \cdot \boldsymbol{B} = 0$ Exactly for MHD ModelsKaibo Hu, Yicong Ma, Jinchao Xu
This paper is devoted to the design and analysis of some structure-preserving finite element schemes for the magnetohydrodynamics (MHD) system. The main feature of the method is that it naturally preserves the important Gauss law, namely $\nabla\cdot\boldsymbol{B}=0$. In contrast to most existing approaches that eliminate the electrical field variable $\boldsymbol{E}$ and give a direct discretization of the magnetic field, our new approach discretizes the electric field $\boldsymbol{E}$ by Nédélec type edge elements for $H(\mathrm{curl})$, while the magnetic field $\boldsymbol{B}$ by Raviart-Thomas type face elements for $H(\mathrm{div})$. As a result, the divergence-free condition on the magnetic field holds exactly on the discrete level. For this new finite element method, an energy stability estimate can be naturally established in an analogous way as in the continuous case. Furthermore, well-posedness is rigorously established in the paper for both the Picard and Newton linearization of the fully nonlinear systems by using the Brezzi theory for both the continuous and discrete cases. This well-posedness naturally leads to robust (and optimal) preconditioners for the linearized systems.
NANov 5, 2017
On the Stability and Accuracy of Partially and Fully Implicit Schemes for Phase Field ModelingJinchao Xu, Yukun Li, Shuonan Wu et al.
We study in this paper the accuracy and stability of partially and fully implicit schemes for phase field modeling. Through theoretical and numerical analysis of Allen-Cahn and Cahn-Hillard models, we investigate the potential problems of using partially implicit schemes, demonstrate the importance of using fully implicit schemes and discuss the limitation of energy stability that are often used to evaluate the quality of a numerical scheme for phase-field modeling. In particular, we make the following observations: 1. a convex splitting scheme (CSS in short) can be equivalent to some fully implicit scheme (FIS in short) with a much different time scaling and thus it may lack numerical accuracy; 2. most implicit schemes (in discussions) are energy-stable if the time-step size is sufficiently small; 3. a traditionally known conditionally energy-stable scheme still possess an unconditionally energy-stable physical solution; 4. an unconditionally energy-stable scheme is not necessarily better than a conditionally energy-stable scheme when the time step size is not small enough; 5. a first-order FIS for the Allen-Cahn model can be devised so that the maximum principle will be valid on the discrete level and hence the discrete phase variable satisfies $|u_h(x)|\le 1$ for all $x$ and, furthermore, the linearized discretized system can be effectively preconditioned by discrete Poisson operators.
NAMar 9, 2015
Robust Preconditioners for Incompressible MHD ModelsYicong Ma, Kaibo Hu, Xiaozhe Hu et al.
In this paper, we develop two classes of robust preconditioners for the structure-preserving discretization of the incompressible magnetohydrodynamics (MHD) system. By studying the well-posedness of the discrete system, we design block preconditioners for them and carry out rigorous analysis on their performance. We prove that such preconditioners are robust with respect to most physical and discretization parameters. In our proof, we improve the existing estimates of the block triangular preconditioners for saddle point problems by removing the scaling parameters, which are usually difficult to choose in practice. This new technique is not only applicable to the MHD system, but also to other problems. Moreover, we prove that Krylov iterative methods with our preconditioners preserve the divergence-free condition exactly, which complements the structure-preserving discretization. Another feature is that we can directly generalize this technique to other discretizations of the MHD system. We also present preliminary numerical results to support the theoretical results and demonstrate the robustness of the proposed preconditioners.
NAFeb 23, 2019
A Mixed Discontinuous Galerkin Method for Linear Elasticity with Strongly Imposed SymmetryFei Wang, Shuonan Wu, Jinchao Xu
In this paper, we study a mixed discontinuous Galerkin (MDG) method to solve linear elasticity problem with arbitrary order discontinuous finite element spaces in $d$-dimension ($d=2,3$). This method uses polynomials of degree $k+1$ for the stress and of degree $k$ for the displacement ($k\geq 0$). The mixed DG scheme is proved to be well-posed under proper norms. Specifically, we prove that, for any $k \geq 0$, the $H({\rm div})$-like error estimate for the stress and $L^2$ error estimate for the displacement are optimal. We further establish the optimal $L^2$ error estimate for the stress provided that the $\mathcal{P}_{k+2}-\mathcal{P}_{k+1}^{-1}$ Stokes pair is stable and $k \geq d$. We also provide numerical results of MDG showing that the orders of convergence are actually sharp.
NAOct 17, 2017
Structure-preserving Finite Element Methods for Stationary MHD ModelsKaibo Hu, Jinchao Xu
In this paper, we develop a class of mixed finite element scheme for stationary magnetohydrodynamics (MHD) models, using magnetic field $\bm B$ and current density $\bm j$ as the discretization variables. We show that the Gauss's law for the magnetic field, namely $\nabla\cdot\bm{B}=0$, and the energy law for the entire system are exactly preserved in the finite element schemes. Based on some new basic estimates for $H^{h}(\mathrm{div})$, we show that the new finite element scheme is well-posed. Furthermore, we show the existence of solutions to the nonlinear problems and the convergence of Picard iterations and finite element methods under some conditions.
LGAug 9, 2022
On the Activation Function Dependence of the Spectral Bias of Neural NetworksQingguo Hong, Jonathan W. Siegel, Qinyang Tan et al.
Neural networks are universal function approximators which are known to generalize well despite being dramatically overparameterized. We study this phenomenon from the point of view of the spectral bias of neural networks. Our contributions are two-fold. First, we provide a theoretical explanation for the spectral bias of ReLU neural networks by leveraging connections with the theory of finite element methods. Second, based upon this theory we predict that switching the activation function to a piecewise linear B-spline, namely the Hat function, will remove this spectral bias, which we verify empirically in a variety of settings. Our empirical studies also show that neural networks with the Hat activation function are trained significantly faster using stochastic gradient descent and ADAM. Combined with previous work showing that the Hat activation function also improves generalization accuracy on image classification tasks, this indicates that using the Hat activation provides significant advantages over the ReLU on certain problems.
NAJan 9, 2018
Nonconforming Finite Element Spaces for $2m$-th Order Partial Differential Equations on $\mathbb{R}^n$ Simplicial Grids When $m=n+1$Shuonan Wu, Jinchao Xu
In this paper, we propose a family of nonconforming finite elements for $2m$-th order partial differential equations in $\mathbb{R}^n$ on simplicial grids when $m=n+1$. This family of nonconforming elements naturally extends the elements proposed by Wang and Xu [Math. Comp. 82(2013), pp. 25-43] , where $m \leq n$ is required. We prove the unisolvent property by induction on the dimensions using the similarity properties of both shape function spaces and degrees of freedom. The proposed elements have approximability, pass the generalized patch test and hence converge. We also establish quasi-optimal error estimates in the broken $H^3$ norm for the 2D nonconforming element. In addition, we propose an $H^3$ nonconforming finite element that is robust for the sixth order singularly perturbed problems in 2D. These theoretical results are further validated by the numerical tests for the 2D tri-harmonic problem.
NAApr 9, 2013
A simple preconditioner for a discontinuous Galerkin method for the Stokes problemBlanca Ayuso de Dios, Franco Brezzi, L. Donatella Marini et al.
In this paper we construct Discontinuous Galerkin approximations of the Stokes problem where the velocity field is H(div)-conforming. This implies that the velocity solution is divergence-free in the whole domain. This property can be exploited to design a simple and effective preconditioner for the final linear system.
MATH-PHFeb 28, 2017
Multiphase Allen-Cahn and Cahn-Hilliard Models and Their Discretizations with the Effect of Pairwise Surface TensionsShuonan Wu, Jinchao Xu
In this paper, the mathematical properties and numerical discretizations of multiphase models that simulate the phase separation of an $N$-component mixture are studied. For the general choice of phase variables, the unisolvent property of the coefficient matrix involved in the $N$-phase models based on the pairwise surface tensions is established. Moreover, the symmetric positive-definite property of the coefficient matrix on an $(N-1)$-dimensional hyperplane --- which is of fundamental importance to the well-posedness of the models --- can be proved equivalent to some physical condition for pairwise surface tensions. The $N$-phase Allen-Cahn and $N$-phase Cahn-Hilliard equations can then be derived from the free-energy functional. A natural property is that the resulting dynamics of concentrations are independent of phase variables chosen. Finite element discretizations for $N$-phase models can be obtained as a natural extension of the existing discretizations for the two-phase model. The discrete energy law of the numerical schemes can be proved and numerically observed under some restrictions pertaining to time step size. Numerical experiments including the spinodal decomposition and the evolution of triple junctions are described in order to investigate the effect of pairwise surface tensions.
NAMar 3, 2012
Optimal solvers for fourth-order PDEs discretized on unstructured gridsShuo Zhang, Jinchao Xu
This paper provides the first provable $\mathcal{O}(N \log N)$ algorithms for the linear system arising from the direct finite element discretization of the fourth-order equation with different boundary conditions on unstructured grids of size $N$ on an arbitrary polygoanl domain. Several preconditioners are presented, and the conjugate gradient methods applied with these preconditioners are proven to converge uniformly with respect to the size of the preconditioned linear system. One main ingredient of the optimal preconditioners is a mixed-form discretization of the fourth-order problem. Such a mixed-form discretization leads to a non-desirable ---either non-optimal or non-convergent--- approximation of the original solution, but it provides optimal preconditioners for the direct finite element problem. It is further shown that the implementation of the preconditioners can be reduced to the solution of several discrete Poisson equations. Therefore, any existing optimal or nearly optimal solver, such as geometric or algebraic multigrid methods, for Poisson equations would lead to a nearly optimal solver for the discrete fourth-order system. A number of nonstandard Sobolev spaces and their discretizations defined on the boundary of polygonal domains are carefully studied and used for the analysis of those preconditioners.
NAApr 25, 2017
New Hybridized Mixed Methods for Linear Elasticity and Optimal Multilevel SolversShihua Gong, Shuonan Wu, Jinchao Xu
In this paper, we present a family of new mixed finite element methods for linear elasticity for both spatial dimensions $n=2,3$, which yields a conforming and strongly symmetric approximation for stress. Applying $\mathcal{P}_{k+1}-\mathcal{P}_k$ as the local approximation for the stress and displacement, the mixed methods achieve the optimal order of convergence for both the stress and displacement when $k \ge n$. For the lower order case $(n-2\le k<n)$, the stability and convergence still hold on some special grids. The proposed mixed methods are efficiently implemented by hybridization, which imposes the inter-element normal continuity of the stress by a Lagrange multiplier. Then, we develop and analyze multilevel solvers for the Schur complement of the hybridized system in the two dimensional case. Provided that no nearly singular vertex on the grids, the proposed solvers are proved to be uniformly convergent with respect to both the grid size and Poisson's ratio. Numerical experiments are provided to validate our theoretical results.
NANov 5, 2015
Modeling and Simulation for Fluid-Rotating Structure InteractionKai Yang, Pengtao Sun, Lu Wang et al.
In this paper, we study a dynamic fluid-structure interaction (FSI) model for an elastic structure that is immersed and spinning in the fluid. We develop a linear constitutive model to describe the motion of a rotational elastic structure which is suitable for the application of arbitrary Lagrangian-Eulerian (ALE) method in FSI simulation. Additionally, a novel ALE mapping method is designed to generate the moving fluid mesh while the deformable structure spins in a non-axisymmetric fluid channel. The structure velocity is adopted as the principle unknown to form a monolithic saddle-point system together with fluid velocity and pressure. We discretize the nonlinear saddle-point system with mixed finite element method and Newton's linearization, and prove that the derived saddle-point problem is well-posed. The developed methodology is applied to a self-defined elastic structure and a realistic hydro-turbine under a prescribed angular velocity. Both illustrate the satisfactory numerical results of an elastic structure that is deforming and rotating while interacting with the fluid. The numerical validation is also conducted to demonstrate the modeling consistency.
NAFeb 15, 2013
Combined Preconditioning with Applications in Reservoir SimulationXiaozhe Hu, Shuhong Wu, Xiao-Hui Wu et al.
We develop a simple algorithmic framework to solve large-scale symmetric positive definite linear systems. At its core, the framework relies on two components: (1) a norm-convergent iterative method (i.e. smoother) and (2) a preconditioner. The resulting preconditioner, which we refer to as a combined preconditioner, is much more robust and efficient than the iterative method and preconditioner when used in Krylov subspace methods. We prove that the combined preconditioner is positive definite and show estimates on the condition number of the preconditioned system. We combine an algebraic multigrid method and an incomplete factorization preconditioner to test the proposed framework on problems in petroleum reservoir simulation. Our numerical experiments demonstrate noticeable speed-up when we compare our combined method with the standalone algebraic multigrid method or the incomplete factorization preconditioner.
NAJun 22, 2011
Lower Bounds of the Discretization for Piecewise PolynomialsQun Lin, Hehu Xie, Jinchao Xu
Assume that $V_h$ is a space of piecewise polynomials of degree less than $r\geq 1$ on a family of quasi-uniform triangulation of size $h$. Then the following well-known upper bound holds for a sufficiently smooth function $u$ and $p\in [1, \infty]$ $$ \inf_{v_h\in V_h}\|u-v_h\|_{j,p,Ω,h} \le C h^{r-j} |u|_{r,p,Ω},\quad 0\le j\le r. $$ In this paper, we prove that, roughly speaking, if $u\not\in V_h$, the above estimate is sharp. Namely, $$ \inf_{v_h\in V_h}\|u-v_h\|_{j,p,Ω,h} \ge c h^{r-j},\quad 0\le j\le r, \ \ 1\leq p\leq \infty, $$ for some $c>0$. The above result is further extended to various situations including more general Sobolev space norms, general shape regular grids and many different types of finite element spaces. As an application, the sharpness of finite element approximation of elliptic problems and the corresponding eigenvalue problems is established.
NAFeb 15, 2013
Comparative Convergence Analysis of Nonlinear AMLI-cycle MultigridXiaozhe Hu, Panayot S. Vassilevski, Jinchao Xu
The main purpose of this paper is to provide a comprehensive convergence analysis of nonlinear AMLI-cycle multigrid method for symmetric positive definite problems. Based on classical assumptions for approximation and smoothing properties, we show that the nonlinear AMLI-cycle MG method is uniformly convergent. Furthermore, under only the assumption that the smoother is convergent, we show that the nonlinear AMLI-cycle method is always better (or not worse) than the respective V-cycle MG method. Finally, numerical experiments are presented to illustrate the theoretical results.
NAFeb 10, 2012
Local Multilevel Preconditioners for Elliptic Equations with Jump Coefficients on Bisection GridsLong Chen, Michael Holst, Jinchao Xu et al.
The goal of this paper is to design optimal multilevel solvers for the finite element approximation of second order linear elliptic problems with piecewise constant coefficients on bisection grids. Local multigrid and BPX preconditioners are constructed based on local smoothing only at the newest vertices and their immediate neighbors. The analysis of eigenvalue distributions for these local multilevel preconditioned systems shows that there are only a fixed number of eigenvalues which are deteriorated by the large jump. The remaining eigenvalues are bounded uniformly with respect to the coefficients and the meshsize. Therefore, the resulting preconditioned conjugate gradient algorithm will converge with an asymptotic rate independent of the coefficients and logarithmically with respect to the meshsize. As a result, the overall computational complexity is nearly optimal.
NAJul 16, 2018
Generalized Gaffney inequality and discrete compactness for discrete differential formsJuncai He, Kaibo Hu, Jinchao Xu
We prove generalized Gaffney inequalities and the discrete compactness for finite element differential forms on $s$-regular domains, including general Lipschitz domains. In computational electromagnetism, special cases of these results have been established for edge elements with weakly imposed divergence-free conditions and used in the analysis of nonlinear and eigenvalue problems. In this paper, we generalize these results to discrete differential forms, not necessarily with strongly or weakly imposed constraints. The analysis relies on a new Hodge mapping and its approximation property. As an application, we show $L^{p}$ estimates for several finite element approximations of the scalar and vector Laplacian problems.
NASep 6, 2012
On Adaptive Eulerian-Lagrangian Method for Linear Convection-Diffusion ProblemsXiaozhe Hu, Young-Ju Lee, Jinchao Xu et al.
In this paper, we consider the adaptive Eulerian--Lagrangian method (ELM) for linear convection-diffusion problems. Unlike the classical a posteriori error estimations, we estimate the temporal error along the characteristics and derive a new a posteriori error bound for ELM semi-discretization. With the help of this proposed error bound, we are able to show the optimal convergence rate of ELM for solutions with minimal regularity. Furthermore, by combining this error bound with a standard residual-type estimator for the spatial error, we obtain a posteriori error estimators for a fully discrete scheme. We present numerical tests to demonstrate the efficiency and robustness of our adaptive algorithm.
NAMay 5, 2011
Analysis of two-level method for anisotropic diffusion equations on aligned and non-aligned gridsGuozhu Yu, Jinchao Xu, Ludmil Zikatanov
This paper is devoted to the multigrid convergence analysis for the linear systems arising from the conforming linear finite element discretization of the second order elliptic equations with anisotropic diffusion. The multigrid convergence behavior is known to strongly depend on whether the discretization grid is aligned or non-aligned with the anisotropic direction and analyses in the paper will be mainly focused on two-level algorithms. For an aligned grid case, a lower bound is given for point-wise smoother which shows deterioration of convergence rate. In both aligned and non-aligned cases we show that for a specially designed block smoother the convergence is uniform with respect to both anisotropy ratio and mesh size in the energy norm. The analysis is complemented with numerical experiments which confirm the theoretical results
AIFeb 2, 2023
FV-MgNet: Fully Connected V-cycle MgNet for Interpretable Time Series ForecastingJianqing Zhu, Juncai He, Lian Zhang et al.
By investigating iterative methods for a constrained linear model, we propose a new class of fully connected V-cycle MgNet for long-term time series forecasting, which is one of the most difficult tasks in forecasting. MgNet is a CNN model that was proposed for image classification based on the multigrid (MG) methods for solving discretized partial differential equations (PDEs). We replace the convolutional operations with fully connected operations in the existing MgNet and then apply them to forecasting problems. Motivated by the V-cycle structure in MG, we further propose the FV-MgNet, a V-cycle version of the fully connected MgNet, to extract features hierarchically. By evaluating the performance of FV-MgNet on popular data sets and comparing it with state-of-the-art models, we show that the FV-MgNet achieves better results with less memory usage and faster inference speed. In addition, we develop ablation experiments to demonstrate that the structure of FV-MgNet is the best choice among the many variants.
76.3NAMay 23
A quasi-monolithic localized high-order ALE finite element method for multi-scale fluid-structure interaction problemsLingyue Shen, Qi Xin, Yan Chen et al.
This paper presents a quasi-monolithic localized high-order arbitrary Lagrangian-Eulerian (qMLH-ALE) finite element method for multi-scale fluid-structure interaction (FSI) in microfluidic systems. The fluid momentum, the incompressible Neo-Hookean constitutive law, and the left Cauchy-Green tensor $\mathcal{B}$ are assembled into a single implicit system, while the harmonic mesh extension is updated explicitly in a staggered manner. Isoparametric $\mathcal{P}_2$ elements provide third-order geometric approximation of curved fluid-solid interfaces, and a second-order implicit-explicit partitioned Runge-Kutta scheme delivers second-order temporal accuracy without the dissipation of backward Euler. A localized updating strategy confines the moving mesh and the deformation history to a body-fitted sub-domain coupled with a precomputed steady background flow, bridging the scale disparity between local FSI dynamics and the macroscopic microchannel geometry. The Turek-Hron FSI3 benchmark, performed at unit fluid-solid density ratio, reproduces the reference beam-tip amplitude and frequency within $3\%$, confirming stability under the strong added-mass coupling that destabilizes conventional partitioned schemes. Three-dimensional particle-focusing simulations in spiral microchannels further illustrate the framework on long-range multi-scale problems.
NAMar 15, 2015
Energetically stable discretizations for charge carrier transport and electrokinetic modelsChun Liu, Maximilian Metti, Jinchao Xu
A finite element discretization using a method of lines approached is proposed for approximately solving the Poisson-Nernst-Planck (PNP) equations. This discretization scheme enforces positivity of the computed solutions, corresponding to particle density functions, and a discrete energy estimate is established that resembles the familiar energy law for the PNP system. This energy estimate is extended to finite element solutions to an electrokinetic model, which couples the PNP system with the Navier-Stokes equations. Numerical experiments are conducted to validate convergence of the computed solution and verify the discrete energy estimate.
NAFeb 25, 2016
Fast multilevel solvers for a class of discrete fourth order parabolic problemsBin Zheng, Luoping Chen, Xiaozhe Hu et al.
In this paper, we study fast iterative solvers for the solution of fourth order parabolic equations discretized by mixed finite element methods. We propose to use consistent mass matrix in the discretization and use lumped mass matrix to construct efficient preconditioners. We provide eigenvalue analysis for the preconditioned system and estimate the convergence rate of the preconditioned GMRes method. Furthermore, we show that these preconditioners only need to be solved inexactly by optimal multigrid algorithms. Our numerical examples indicate that the proposed preconditioners are very efficient and robust with respect to both discretization parameters and diffusion coefficients. We also investigate the performance of multigrid algorithms with either collective smoothers or distributive smoothers when solving the preconditioner systems.
NADec 6, 2012
A Parallel Auxiliary Grid AMG Method for GPULu Wang, Xiaozhe Hu, Jonathan Cohen et al.
In this paper, we develop a new parallel auxiliary grid algebraic multigrid (AMG) method to leverage the power of graphic processing units (GPUs). In the construction of the hierarchical coarse grid, we use a simple and fixed coarsening procedure based on a region quadtree generated from an auxiliary grid. This allows us to explicitly control the sparsity patterns and operator complexities of the AMG solver. This feature provides (nearly) optimal load balancing and predictable communication patterns, which makes our new algorithm suitable for parallel computing, especially on GPU. We also design a parallel smoother based on the special coloring of the quadtree to accelerate the convergence rate and improve the parallel performance of this solver. Based on the CUDA toolkit [40], we implemented our new parallel auxiliary grid AMG method on GPU and the numerical results of this implementation demonstrate the efficiency of our new method. The results achieve an average speedup of over 4 on quasi-uniform grids and 2 on shape regular grids when compared to the AMG implementation in CUSP.
NAApr 5, 2013
Block Triangular Preconditioning for Stochastic Galerkin MethodBin Zheng, Guang Lin, Jinchao Xu
In this paper we study fast iterative solvers for the large sparse linear systems resulting from the stochastic Galerkin discretization of stochastic partial differential equations. A block triangular preconditioner is introduced and applied to the Krylov subspace methods, including the generalized minimum residual method and the generalized preconditioned conjugate gradient method. This preconditioner utilizes the special structures of the stochastic Galerkin matrices to achieve high efficiency. Spectral bounds for the preconditioned matrix are provided for convergence analysis. The preconditioner system can be solved approximately by geometric multigrid V-cycle. Numerical results indicate that the block triangular preconditioner has better performance than the traditional block diagonal preconditioner for stochastic problems with large variance.
LGOct 16, 2023
MgNO: Efficient Parameterization of Linear Operators via MultigridJuncai He, Xinliang Liu, Jinchao Xu
In this work, we propose a concise neural operator architecture for operator learning. Drawing an analogy with a conventional fully connected neural network, we define the neural operator as follows: the output of the $i$-th neuron in a nonlinear operator layer is defined by $O_i(u) = σ\left( \sum_j W_{ij} u + B_{ij}\right)$. Here, $ W_{ij}$ denotes the bounded linear operator connecting $j$-th input neuron to $i$-th output neuron, and the bias $ B_{ij}$ takes the form of a function rather than a scalar. Given its new universal approximation property, the efficient parameterization of the bounded linear operators between two neurons (Banach spaces) plays a critical role. As a result, we introduce MgNO, utilizing multigrid structures to parameterize these linear operators between neurons. This approach offers both mathematical rigor and practical expressivity. Additionally, MgNO obviates the need for conventional lifting and projecting operators typically required in previous neural operators. Moreover, it seamlessly accommodates diverse boundary conditions. Our empirical observations reveal that MgNO exhibits superior ease of training compared to other CNN-based models, while also displaying a reduced susceptibility to overfitting when contrasted with spectral-type neural operators. We demonstrate the efficiency and accuracy of our method with consistently state-of-the-art performance on different types of partial differential equations (PDEs).
MLAug 20, 2024
Approximation Rates for Shallow ReLU$^k$ Neural Networks on Sobolev Spaces via the Radon TransformTong Mao, Jonathan W. Siegel, Jinchao Xu
Let $Ω\subset \mathbb{R}^d$ be a bounded domain. We consider the problem of how efficiently shallow neural networks with the ReLU$^k$ activation function can approximate functions from Sobolev spaces $W^s(L_p(Ω))$ with error measured in the $L_q(Ω)$-norm. Utilizing the Radon transform and recent results from discrepancy theory, we provide a simple proof of nearly optimal approximation rates in a variety of cases, including when $q\leq p$, $p\geq 2$, and $s \leq k + (d+1)/2$. The rates we derive are optimal up to logarithmic factors, and significantly generalize existing results. An interesting consequence is that the adaptivity of shallow ReLU$^k$ neural networks enables them to obtain optimal approximation rates for smoothness up to order $s = k + (d+1)/2$, even though they represent piecewise polynomials of fixed degree $k$.
CLDec 4, 2024Code
Alignment at Pre-training! Towards Native Alignment for Arabic LLMsJuhao Liang, Zhenyang Cai, Jianqing Zhu et al.
The alignment of large language models (LLMs) is critical for developing effective and safe language models. Traditional approaches focus on aligning models during the instruction tuning or reinforcement learning stages, referred to in this paper as `post alignment'. We argue that alignment during the pre-training phase, which we term `native alignment', warrants investigation. Native alignment aims to prevent unaligned content from the beginning, rather than relying on post-hoc processing. This approach leverages extensively aligned pre-training data to enhance the effectiveness and usability of pre-trained models. Our study specifically explores the application of native alignment in the context of Arabic LLMs. We conduct comprehensive experiments and ablation studies to evaluate the impact of native alignment on model performance and alignment stability. Additionally, we release open-source Arabic LLMs that demonstrate state-of-the-art performance on various benchmarks, providing significant benefits to the Arabic LLM community.
CLDec 16, 2024Code
Second Language (Arabic) Acquisition of LLMs via Progressive Vocabulary ExpansionJianqing Zhu, Huang Huang, Zhihang Lin et al.
This paper addresses the critical need for democratizing large language models (LLM) in the Arab world, a region that has seen slower progress in developing models comparable to state-of-the-art offerings like GPT-4 or ChatGPT 3.5, due to a predominant focus on mainstream languages (e.g., English and Chinese). One practical objective for an Arabic LLM is to utilize an Arabic-specific vocabulary for the tokenizer that could speed up decoding. However, using a different vocabulary often leads to a degradation of learned knowledge since many words are initially out-of-vocabulary (OOV) when training starts. Inspired by the vocabulary learning during Second Language (Arabic) Acquisition for humans, the released AraLLaMA employs progressive vocabulary expansion, which is implemented by a modified BPE algorithm that progressively extends the Arabic subwords in its dynamic vocabulary during training, thereby balancing the OOV ratio at every stage. The ablation study demonstrated the effectiveness of Progressive Vocabulary Expansion. Moreover, AraLLaMA achieves decent performance comparable to the best Arabic LLMs across a variety of Arabic benchmarks. Models, training data, benchmarks, and codes will be all open-sourced.
NANov 4, 2025
Condition Numbers and Eigenvalue Spectra of Shallow Networks on SpheresXinliang Liu, Tong Mao, Jinchao Xu
We present an estimation of the condition numbers of the \emph{mass} and \emph{stiffness} matrices arising from shallow ReLU$^k$ neural networks defined on the unit sphere~$\mathbb{S}^d$. In particular, when $\{θ_j^*\}_{j=1}^n \subset \mathbb{S}^d$ is \emph{antipodally quasi-uniform}, the condition number is sharp. Indeed, in this case, we obtain sharp asymptotic estimates for the full spectrum of eigenvalues and characterize the structure of the corresponding eigenspaces, showing that the smallest eigenvalues are associated with an eigenbasis of low-degree polynomials while the largest eigenvalues are linked to high-degree polynomials. This spectral analysis establishes a precise correspondence between the approximation power of the network and its numerical stability.
77.8NAApr 14
Sharp inf-sup estimate for the Stokes equation in tight domains with periodic pillars and some numerical implicationsQi Xin, Shihua Gong, Jinchao Xu
The predictive simulation of fluid dynamics in densely packed microfluidic devices, such as Deterministic Lateral Displacement (DLD) arrays, is severely bottlenecked by the stagnation of standard iterative solvers. In this paper, we reveal that this failure is not an algorithmic artifact, but fundamentally rooted in the pre-asymptotic degradation of the pressure-velocity coupling stability. By rigorously analyzing periodic pillar geometries in this generalized lattice framework, we prove that the continuous Ladyzhenskaya-Babuška-Brezzi (LBB) condition, also called the inf-sup constant, deteriorates exactly as $m^{-1}$ up to a positive multiplicative constant, where $m$ is the pillar density (the number of pillars per unit length). This causes a severe a priori error amplification and extreme ill-conditioning in Schur complement of the saddle point system. To overcome this theoretical limit, we propose a parameter-free, adaptively scaled Augmented Lagrangian (AL) stabilization strategy. Extensive numerical experiments on both standard square and highly asymmetric DLD arrays validate our theoretical bounds and demonstrate the robustness of the proposed AL method.
23.8OCApr 3
Randomized subspace correction methods for convex optimizationBoou Jiang, Jongho Park, Jinchao Xu
This paper introduces an abstract framework for randomized subspace correction methods for convex optimization, which unifies and generalizes a broad class of existing algorithms, including domain decomposition, multigrid, and block coordinate descent methods. We provide a convergence rate analysis ranging from minimal assumptions to more practical settings, such as sharpness and strong convexity. While most existing studies on block coordinate descent methods focus on nonoverlapping decompositions and smooth or strongly convex problems, our framework extends to more general settings involving arbitrary space decompositions, inexact local solvers, and problems with weaker smoothness or convexity assumptions. The proposed framework is broadly applicable to convex optimization problems arising in areas such as nonlinear partial differential equations, imaging, and data science.
NADec 21, 2023
Deep Neural Networks and Finite Elements of Any Order on Arbitrary DimensionsJuncai He, Jinchao Xu
In this study, we establish that deep neural networks employing ReLU and ReLU$^2$ activation functions can effectively represent Lagrange finite element functions of any order on various simplicial meshes in arbitrary dimensions. We introduce two novel formulations for globally expressing the basis functions of Lagrange elements, tailored for both specific and arbitrary meshes. These formulations are based on a geometric decomposition of the elements, incorporating several insightful and essential properties of high-dimensional simplicial meshes, barycentric coordinate functions, and global basis functions of linear elements. This representation theory facilitates a natural approximation result for such deep neural networks. Our findings present the first demonstration of how deep neural networks can systematically generate general continuous piecewise polynomial functions on both specific or arbitrary simplicial meshes.
LGDec 27, 2023
Expressivity and Approximation Properties of Deep Neural Networks with ReLU$^k$ ActivationJuncai He, Tong Mao, Jinchao Xu
In this paper, we investigate the expressivity and approximation properties of deep neural networks employing the ReLU$^k$ activation function for $k \geq 2$. Although deep ReLU networks can approximate polynomials effectively, deep ReLU$^k$ networks have the capability to represent higher-degree polynomials precisely. Our initial contribution is a comprehensive, constructive proof for polynomial representation using deep ReLU$^k$ networks. This allows us to establish an upper bound on both the size and count of network parameters. Consequently, we are able to demonstrate a suboptimal approximation rate for functions from Sobolev spaces as well as for analytic functions. Additionally, through an exploration of the representation power of deep ReLU$^k$ networks for shallow networks, we reveal that deep ReLU$^k$ networks can approximate functions from a range of variation spaces, extending beyond those generated solely by the ReLU$^k$ activation function. This finding demonstrates the adaptability of deep ReLU$^k$ networks in approximating functions within various variation spaces.
LGNov 16, 2025
On the Dimension-Free Approximation of Deep Neural Networks for Symmetric Korobov FunctionsYulong Lu, Tong Mao, Jinchao Xu et al.
Deep neural networks have been widely used as universal approximators for functions with inherent physical structures, including permutation symmetry. In this paper, we construct symmetric deep neural networks to approximate symmetric Korobov functions and prove that both the convergence rate and the constant prefactor scale at most polynomially with respect to the ambient dimension. This represents a substantial improvement over prior approximation guarantees that suffer from the curse of dimensionality. Building on these approximation bounds, we further derive a generalization-error rate for learning symmetric Korobov functions whose leading factors likewise avoid the curse of dimensionality.
NAOct 5, 2025
Sharp Lower Bounds for Linearized ReLU^k Approximation on the SphereTong Mao, Jinchao Xu
We prove a saturation theorem for linearized shallow ReLU$^k$ neural networks on the unit sphere $\mathbb S^d$. For any antipodally quasi-uniform set of centers, if the target function has smoothness $r>\tfrac{d+2k+1}{2}$, then the best $\mathcal{L}^2(\mathbb S^d)$ approximation cannot converge faster than order $n^{-\frac{d+2k+1}{2d}}$. This lower bound matches existing upper bounds, thereby establishing the exact saturation order $\tfrac{d+2k+1}{2d}$ for such networks. Our results place linearized neural-network approximation firmly within the classical saturation framework and show that, although ReLU$^k$ networks outperform finite elements under equal degrees $k$, this advantage is intrinsically limited.
LGAug 28, 2025
Self-Composing Neural Operators with Depth and Accuracy Scaling via Adaptive Train-and-Unroll ApproachJuncai He, Xinliang Liu, Jinchao Xu
In this work, we propose a novel framework to enhance the efficiency and accuracy of neural operators through self-composition, offering both theoretical guarantees and practical benefits. Inspired by iterative methods in solving numerical partial differential equations (PDEs), we design a specific neural operator by repeatedly applying a single neural operator block, we progressively deepen the model without explicitly adding new blocks, improving the model's capacity. To train these models efficiently, we introduce an adaptive train-and-unroll approach, where the depth of the neural operator is gradually increased during training. This approach reveals an accuracy scaling law with model depth and offers significant computational savings through our adaptive training strategy. Our architecture achieves state-of-the-art (SOTA) performance on standard benchmarks. We further demonstrate its efficacy on a challenging high-frequency ultrasound computed tomography (USCT) problem, where a multigrid-inspired backbone enables superior performance in resolving complex wave phenomena. The proposed framework provides a computationally tractable, accurate, and scalable solution for large-scale data-driven scientific machine learning applications.
LGMay 17, 2023
DualFL: A Duality-based Federated Learning Algorithm with Communication Acceleration in the General Convex RegimeJongho Park, Jinchao Xu
We propose a new training algorithm, named DualFL (Dualized Federated Learning), for solving distributed optimization problems in federated learning. DualFL achieves communication acceleration for very general convex cost functions, thereby providing a solution to an open theoretical problem in federated learning concerning cost functions that may not be smooth nor strongly convex. We provide a detailed analysis for the local iteration complexity of DualFL to ensure the overall computational efficiency of DualFL. Furthermore, we introduce a completely new approach for the convergence analysis of federated learning based on a dual formulation. This new technique enables concise and elegant analysis, which contrasts the complex calculations used in existing literature on convergence of federated learning algorithms.
CVDec 14, 2021
An Interpretive Constrained Linear Model for ResNet and MgNetJuncai He, Jinchao Xu, Lian Zhang et al.
We propose a constrained linear data-feature-mapping model as an interpretable mathematical model for image classification using a convolutional neural network (CNN). From this viewpoint, we establish detailed connections between the traditional iterative schemes for linear systems and the architectures of the basic blocks of ResNet- and MgNet-type models. Using these connections, we present some modified ResNet models that compared with the original models have fewer parameters and yet can produce more accurate results, thereby demonstrating the validity of this constrained learning data-feature-mapping assumption. Based on this assumption, we further propose a general data-feature iterative scheme to show the rationality of MgNet. We also provide a systematic numerical study on MgNet to show its success and advantages in image classification problems and demonstrate its advantages in comparison with established networks.
LGSep 1, 2021
Approximation Properties of Deep ReLU CNNsJuncai He, Lin Li, Jinchao Xu
This paper focuses on establishing $L^2$ approximation properties for deep ReLU convolutional neural networks (CNNs) in two-dimensional space. The analysis is based on a decomposition theorem for convolutional kernels with a large spatial size and multi-channels. Given the decomposition result, the property of the ReLU activation function, and a specific structure for channels, a universal approximation theorem of deep ReLU CNNs with classic structure is obtained by showing its connection with one-hidden-layer ReLU neural networks (NNs). Furthermore, approximation properties are obtained for one version of neural networks with ResNet, pre-act ResNet, and MgNet architecture based on connections between these networks.
MLJun 28, 2021
Characterization of the Variation Spaces Corresponding to Shallow Neural NetworksJonathan W. Siegel, Jinchao Xu
We study the variation space corresponding to a dictionary of functions in $L^2(Ω)$ for a bounded domain $Ω\subset \mathbb{R}^d$. Specifically, we compare the variation space, which is defined in terms of a convex hull with related notions based on integral representations. This allows us to show that three important notions relating to the approximation theory of shallow neural networks, the Barron space, the spectral Barron space, and the Radon BV space, are actually variation spaces with respect to certain natural dictionaries.
MLJun 28, 2021
Sharp Lower Bounds on the Approximation Rate of Shallow Neural NetworksJonathan W. Siegel, Jinchao Xu
We consider the approximation rates of shallow neural networks with respect to the variation norm. Upper bounds on these rates have been established for sigmoidal and ReLU activation functions, but it has remained an important open problem whether these rates are sharp. In this article, we provide a solution to this problem by proving sharp lower bounds on the approximation rates for shallow neural networks, which are obtained by lower bounding the $L^2$-metric entropy of the convex hull of the neural network basis functions. In addition, our methods also give sharp lower bounds on the Kolmogorov $n$-widths of this convex hull, which show that the variation spaces corresponding to shallow neural networks cannot be efficiently approximated by linear methods. These lower bounds apply to both sigmoidal activation functions with bounded variation and to activation functions which are a power of the ReLU. Our results also quantify how much stronger the Barron spectral norm is than the variation norm and, combined with previous results, give the asymptotics of the $L^\infty$-metric entropy up to logarithmic factors in the case of the ReLU activation function.
NAMay 10, 2021
ReLU Deep Neural Networks from the Hierarchical Basis PerspectiveJuncai He, Lin Li, Jinchao Xu
We study ReLU deep neural networks (DNNs) by investigating their connections with the hierarchical basis method in finite element methods. First, we show that the approximation schemes of ReLU DNNs for $x^2$ and $xy$ are composition versions of the hierarchical basis approximation for these two functions. Based on this fact, we obtain a geometric interpretation and systematic proof for the approximation result of ReLU DNNs for polynomials, which plays an important role in a series of recent exponential approximation results of ReLU DNNs. Through our investigation of connections between ReLU DNNs and the hierarchical basis approximation for $x^2$ and $xy$, we show that ReLU DNNs with this special structure can be applied only to approximate quadratic functions. Furthermore, we obtain a concise representation to explicitly reproduce any linear finite element function on a two-dimensional uniform mesh by using ReLU DNNs with only two hidden layers.
MLJan 29, 2021
Sharp Bounds on the Approximation Rates, Metric Entropy, and $n$-widths of Shallow Neural NetworksJonathan W. Siegel, Jinchao Xu
In this article, we study approximation properties of the variation spaces corresponding to shallow neural networks with a variety of activation functions. We introduce two main tools for estimating the metric entropy, approximation rates, and $n$-widths of these spaces. First, we introduce the notion of a smoothly parameterized dictionary and give upper bounds on the non-linear approximation rates, metric entropy and $n$-widths of their absolute convex hull. The upper bounds depend upon the order of smoothness of the parameterization. This result is applied to dictionaries of ridge functions corresponding to shallow neural networks, and they improve upon existing results in many cases. Next, we provide a method for lower bounding the metric entropy and $n$-widths of variation spaces which contain certain classes of ridge functions. This result gives sharp lower bounds on the $L^2$-approximation rates, metric entropy, and $n$-widths for variation spaces corresponding to neural networks with a range of important activation functions, including ReLU$^k$ activation functions and sigmoidal activation functions with bounded variation.
CVAug 21, 2020
Training Sparse Neural Networks using Compressed SensingJonathan W. Siegel, Jianhong Chen, Pengchuan Zhang et al.
Pruning the weights of neural networks is an effective and widely-used technique for reducing model size and inference complexity. We develop and test a novel method based on compressed sensing which combines the pruning and training into a single step. Specifically, we utilize an adaptively weighted $\ell^1$ penalty on the weights during training, which we combine with a generalization of the regularized dual averaging (RDA) algorithm in order to train sparse neural networks. The adaptive weighting we introduce corresponds to a novel regularizer based on the logarithm of the absolute value of the weights. We perform a series of ablation studies demonstrating the improvement provided by the adaptive weighting and generalized RDA algorithm. Furthermore, numerical experiments on the CIFAR-10, CIFAR-100, and ImageNet datasets demonstrate that our method 1) trains sparser, more accurate networks than existing state-of-the-art methods; 2) can be used to train sparse networks from scratch, i.e. from a random initialization, as opposed to initializing with a well-trained base model; 3) acts as an effective regularizer, improving generalization accuracy.