NAMay 14, 2016
Interior Penalty Discontinuous Galerkin Methods for Second Order Linear Non-Divergence Form Elliptic PDEsXiaobing Feng, Michael Neilan, Stefan Schnake
This paper develops interior penalty discontinuous Galerkin (IP-DG) methods to approximate $W^{2,p}$ strong solutions of second order linear elliptic partial differential equations (PDEs) in non-divergence form with continuous coefficients. The proposed IP-DG methods are closely related to the IP-DG methods for advection-diffusion equations, and they are easy to implement on existing standard IP-DG software platforms. It is proved that the proposed IP-DG methods have unique solutions and converge with optimal rate to the $W^{2,p}$ strong solution in a discrete $W^{2,p}$-norm. The crux of the analysis is to establish a DG discrete counterpart of the Calderon-Zygmund estimate and to adapt a freezing coefficient technique used for the PDE analysis at the discrete level. As a byproduct of our analysis, we also establish broken $W^{1,p}$-norm error estimates for IP-DG approximations of constant coefficient elliptic PDEs. Numerical experiments are provided to gauge the performance of the proposed IP-DG methods and to validate the theoretical convergence results.
NAOct 10, 2016
An enhanced finite element method for a class of variational problems exhibiting the Lavrentiev gap phenomenonXiaobing Feng, Stefan Schnake
This paper develops an enhanced finite element method for approximating a class of variational problems which exhibit the \textit{Lavrentiev gap phenomenon} in the sense that the minimum values of the energy functional have a nontrivial gap when the functional is minimized on spaces $W^{1,1}$ and $W^{1,\infty}$. To remedy the standard finite element method, which fails to converge for such variational problems, a simple and effective cut-off procedure is utilized to design the (enhanced finite element) discrete energy functional. In essence the proposed discrete energy functional curbs the gap phenomenon by capping the derivatives of its input on a scale of $O(h^{-α})$ (where $h$ denotes the mesh size) for some positive constant $α$. A sufficient condition is proposed for determining the problem-dependent parameter $\a$. Extensive 1-D and 2-D numerical experiment results are provided to show the convergence behavior and the performance of the proposed enhanced finite element method.
NAFeb 26, 2019
Analysis of the Vanishing Moment Method and its Finite Element Approximations for Second-order Linear Elliptic PDEs in Non-divergence FormXiaobing Feng, Thomas Lewis, Stefan Schnake
This paper is concerned with continuous and discrete approximations of $W^{2,p}$ strong solutions of second-order linear elliptic partial differential equations (PDEs) in non-divergence form. The continuous approximation of these equations is achieved through the Vanishing Moment Method (VMM) which adds a small biharmonic term to the PDE. The structure of the new fourth-order PDE is a natural fit for Galerkin-type methods unlike the original second order equation since the highest order term is in divergence form. The well-posedness of the weak form of the perturbed fourth order equation is shown as well as error estimates for approximating the strong solution of the original second-order PDE. A $C^1$ finite element method is then proposed for the fourth order equation, and its existence and uniqueness of solutions as well as optimal error estimates in the $H^2$ norm are shown. Lastly, numerical tests are given to show the validity of the method.
NAJan 17, 2018
A Discontinuous Ritz Method for a Class of Calculus of Variations ProblemsXiaobing Feng, Stefan Schnake
This paper develops an analogue (or counterpart) to discontinuous Galerkin (DG) methods for approximating a general class of calculus of variations problems. The proposed method, called the discontinuous Ritz (DR) method, constructs a numerical solution by minimizing a discrete energy over DG function spaces. The discrete energy includes standard penalization terms as well as the DG finite element (DG-FE) numerical derivatives developed recently by Feng, Lewis, and Neilan in [Feng2013]. It is proved that the proposed DR method converges and that the DG-FE numerical derivatives exhibit a compactness property which is desirable and crucial for applying the proposed DR method to problems with more complex energy functionals. Numerical tests are provided on the classical $p$-Laplace problem to gauge the performance of the proposed DR method.
LGMay 12, 2025
Dynamical Low-Rank Compression of Neural Networks with Robustness under Adversarial AttacksSteffen Schotthöfer, H. Lexie Yang, Stefan Schnake
Deployment of neural networks on resource-constrained devices demands models that are both compact and robust to adversarial inputs. However, compression and adversarial robustness often conflict. In this work, we introduce a dynamical low-rank training scheme enhanced with a novel spectral regularizer that controls the condition number of the low-rank core in each layer. This approach mitigates the sensitivity of compressed models to adversarial perturbations without sacrificing accuracy on clean data. The method is model- and data-agnostic, computationally efficient, and supports rank adaptivity to automatically compress the network at hand. Extensive experiments across standard architectures, datasets, and adversarial attacks show the regularized networks can achieve over 94% compression while recovering or improving adversarial accuracy relative to uncompressed baselines.
37.6LGMar 31
Tucker Attention: A generalization of approximate attention mechanismsTimon Klein, Jonas Kusch, Sebastian Sager et al.
The pursuit of reducing the memory footprint of the self-attention mechanism in multi-headed self attention (MHA) spawned a rich portfolio of methods, e.g., group-query attention (GQA) and multi-head latent attention (MLA). The methods leverage specialized low-rank factorizations across embedding dimensions or attention heads. From the point of view of classical low-rank approximation, these methods are unconventional and raise questions of which objects they really approximate and how to interpret the low-rank behavior of the resulting representations. To answer these questions, this work proposes a generalized view on the weight objects in the self-attention layer and a factorization strategy, which allows us to construct a parameter efficient scheme, called Tucker Attention. Tucker Attention requires an order of magnitude fewer parameters for comparable validation metrics, compared to GQA and MLA, as evaluated in LLM and ViT test cases. Additionally, Tucker Attention~encompasses GQA, MLA, MHA as special cases and is fully compatible with flash-attention and rotary position embeddings (RoPE). This generalization strategy yields insights of the actual ranks achieved by MHA, GQA, and MLA, and further enables simplifications for MLA.