Jonas Kusch

LG
h-index18
12papers
98citations
Novelty61%
AI Score56

12 Papers

86.3NAMay 29
A stable multiplicative dynamical low-rank discretization for the linear Boltzmann-BGK equation

Lena Baumann, Lukas Einkemmer, Christian Klingenberg et al.

The numerical method of dynamical low-rank approximation (DLRA) has recently been applied to various kinetic equations showing a significant reduction of the computational effort. In this paper, we apply this concept to the linear Boltzmann-Bhatnagar-Gross-Krook (Boltzmann-BGK) equation which due its high dimensionality is challenging to solve. Inspired by the special structure of the non-linear Boltzmann-BGK problem, we consider a multiplicative splitting of the distribution function. We propose a rank-adaptive DLRA scheme making use of the basis update & Galerkin integrator and combine it with an additional basis augmentation to ensure numerical stability, for which an analytical proof is given and a classical hyperbolic Courant-Friedrichs-Lewy (CFL) condition is derived. This allows for a further acceleration of computational times and a better understanding of the underlying problem in finding a suitable discretization of the system. Numerical results of a series of different test examples confirm the accuracy and efficiency of the proposed method compared to the numerical solution of the full system.

74.9NAMay 22
An energy stable and conservative multiplicative dynamical low-rank discretization for the Su-Olson problem

Lena Baumann, Lukas Einkemmer, Christian Klingenberg et al.

Computing numerical solutions of the thermal radiative transfer equations on a finely resolved grid can be costly due to high computational and memory requirements. A numerical reduced order method that has recently been applied to a wide variety of kinetic partial differential equations is the concept of dynamical low-rank approximation (DLRA). In this paper, we consider the thermal radiative transfer equations with Su-Olson closure, leading to a linearized kinetic model. For the conducted theoretical and practical considerations we use a multiplicative splitting of the distribution function that poses additional challenges in finding an energy stable discretization and deriving a hyperbolic Courant-Friedrichs-Lewy (CFL) condition. We propose such an energy stable DLRA scheme that makes use of the augmented basis update & Galerkin integrator. This integrator allows for additional basis augmentations, enabling us to give a mathematically rigorous proof of energy stability and local mass conservation. Numerical examples confirm the derived properties and show the computational advantages of the DLRA scheme compared to a numerical solution of the full system of equations.

NAAug 20, 2018
Ray Effect Mitigation for the Discrete Ordinates Method through Quadrature Rotation

Thomas Camminady, Martin Frank, Kerstin Küpper et al.

Solving the radiation transport equation is a challenging task, due to the high dimensionality of the solution's phase space. The commonly used discrete ordinates (S$_N$) method suffers from ray effects which result from a break in rotational symmetry from the finite set of directions chosen by S$_N$. The spherical harmonics (P$_N$) equations, on the other hand, preserve rotational symmetry, but can produce negative particle densities. The discrete ordinates (S$_N$) method, in turn, by construction ensures non-negative particle densities. In this paper we present a modified version of the S$_N$ method, the rotated S$_N$ (rS$_N$) method. Compared to S$_N$, we add a rotation and interpolation step for the angular quadrature points and the respective function values after every time step. Thereby, the number of directions on which the solution evolves is effectively increased and ray effects are mitigated. Solution values on rotated ordinates are computed by an interpolation step. Implementation details are provided and in our experiments the rotation/interpolation step only adds 5% to 10% to the runtime of the S$_N$ method. We apply the rS$_N$ method to the line-source and a lattice test case, both being prone to ray-effects. Ray effects are reduced significantly, even for small numbers of quadrature points. The rS$_N$ method yields qualitatively similar solutions to the S$_N$ method with less than a third of the number of quadrature points, both for the line-source and the lattice problem. The code used to produce our results is freely available and can be downloaded.

LGMay 26, 2022
Low-rank lottery tickets: finding efficient low-rank neural networks via matrix differential equations

Steffen Schotthöfer, Emanuele Zangrando, Jonas Kusch et al.

Neural networks have achieved tremendous success in a large variety of applications. However, their memory footprint and computational demand can render them impractical in application settings with limited hardware or energy resources. In this work, we propose a novel algorithm to find efficient low-rank subnetworks. Remarkably, these subnetworks are determined and adapted already during the training phase and the overall time and memory resources required by both training and evaluating them are significantly reduced. The main idea is to restrict the weight matrices to a low-rank manifold and to update the low-rank factors rather than the full matrix during training. To derive training updates that are restricted to the prescribed manifold, we employ techniques from dynamic model order reduction for matrix differential equations. This allows us to provide approximation, stability, and descent guarantees. Moreover, our method automatically and dynamically adapts the ranks during training to achieve the desired approximation accuracy. The efficiency of the proposed method is demonstrated through a variety of numerical experiments on fully-connected and convolutional networks.

LGFeb 14, 2025Code
HADL Framework for Noise Resilient Long-Term Time Series Forecasting

Aditya Dey, Jonas Kusch, Fadi Al Machot

Long-term time series forecasting is critical in domains such as finance, economics, and energy, where accurate and reliable predictions over extended horizons drive strategic decision-making. Despite the progress in machine learning-based models, the impact of temporal noise in extended lookback windows remains underexplored, often degrading model performance and computational efficiency. In this paper, we propose a novel framework that addresses these challenges by integrating the Discrete Wavelet Transform (DWT) and Discrete Cosine Transform (DCT) to perform noise reduction and extract robust long-term features. These transformations enable the separation of meaningful temporal patterns from noise in both the time and frequency domains. To complement this, we introduce a lightweight low-rank linear prediction layer that not only reduces the influence of residual noise but also improves memory efficiency. Our approach demonstrates competitive robustness to noisy input, significantly reduces computational complexity, and achieves competitive or state-of-the-art forecasting performance across diverse benchmark datasets. Extensive experiments reveal that the proposed framework is particularly effective in scenarios with high noise levels or irregular patterns, making it well suited for real-world forecasting tasks. The code is available in https://github.com/forgee-master/HADL.

LGOct 24, 2024
GeoLoRA: Geometric integration for parameter efficient fine-tuning

Steffen Schotthöfer, Emanuele Zangrando, Gianluca Ceruti et al.

Low-Rank Adaptation (LoRA) has become a widely used method for parameter-efficient fine-tuning of large-scale, pre-trained neural networks. However, LoRA and its extensions face several challenges, including the need for rank adaptivity, robustness, and computational efficiency during the fine-tuning process. We introduce GeoLoRA, a novel approach that addresses these limitations by leveraging dynamical low-rank approximation theory. GeoLoRA requires only a single backpropagation pass over the small-rank adapters, significantly reducing computational cost as compared to similar dynamical low-rank training methods and making it faster than popular baselines such as AdaLoRA. This allows GeoLoRA to efficiently adapt the allocated parameter budget across the model, achieving smaller low-rank adapters compared to heuristic methods like AdaLoRA and LoRA, while maintaining critical convergence, descent, and error-bound theoretical guarantees. The resulting method is not only more efficient but also more robust to varying hyperparameter settings. We demonstrate the effectiveness of GeoLoRA on several state-of-the-art benchmarks, showing that it outperforms existing methods in both accuracy and computational efficiency.

NAFeb 5, 2025
An Augmented Backward-Corrected Projector Splitting Integrator for Dynamical Low-Rank Training

Jonas Kusch, Steffen Schotthöfer, Alexandra Walter

Layer factorization has emerged as a widely used technique for training memory-efficient neural networks. However, layer factorization methods face several challenges, particularly a lack of robustness during the training process. To overcome this limitation, dynamical low-rank training methods have been developed, utilizing robust time integration techniques for low-rank matrix differential equations. Although these approaches facilitate efficient training, they still depend on computationally intensive QR and singular value decompositions of matrices with small rank. In this work, we introduce a novel low-rank training method that reduces the number of required QR decompositions. Our approach integrates an augmentation step into a projector-splitting scheme, ensuring convergence to a locally optimal solution. We provide a rigorous theoretical analysis of the proposed method and demonstrate its effectiveness across multiple benchmarks.

38.3LGMar 31
Tucker Attention: A generalization of approximate attention mechanisms

Timon Klein, Jonas Kusch, Sebastian Sager et al.

The pursuit of reducing the memory footprint of the self-attention mechanism in multi-headed self attention (MHA) spawned a rich portfolio of methods, e.g., group-query attention (GQA) and multi-head latent attention (MLA). The methods leverage specialized low-rank factorizations across embedding dimensions or attention heads. From the point of view of classical low-rank approximation, these methods are unconventional and raise questions of which objects they really approximate and how to interpret the low-rank behavior of the resulting representations. To answer these questions, this work proposes a generalized view on the weight objects in the self-attention layer and a factorization strategy, which allows us to construct a parameter efficient scheme, called Tucker Attention. Tucker Attention requires an order of magnitude fewer parameters for comparable validation metrics, compared to GQA and MLA, as evaluated in LLM and ViT test cases. Additionally, Tucker Attention~encompasses GQA, MLA, MHA as special cases and is fully compatible with flash-attention and rotary position embeddings (RoPE). This generalization strategy yields insights of the actual ranks achieved by MHA, GQA, and MLA, and further enables simplifications for MLA.

LGJun 20, 2025
A geometric framework for momentum-based optimizers for low-rank training

Steffen Schotthöfer, Timon Klein, Jonas Kusch

Low-rank pre-training and fine-tuning have recently emerged as promising techniques for reducing the computational and storage costs of large neural networks. Training low-rank parameterizations typically relies on conventional optimizers such as heavy ball momentum methods or Adam. In this work, we identify and analyze potential difficulties that these training methods encounter when used to train low-rank parameterizations of weights. In particular, we show that classical momentum methods can struggle to converge to a local optimum due to the geometry of the underlying optimization landscape. To address this, we introduce novel training strategies derived from dynamical low-rank approximation, which explicitly account for the underlying geometric structure. Our approach leverages and combines tools from dynamical low-rank approximation and momentum-based optimization to design optimizers that respect the intrinsic geometry of the parameter space. We validate our methods through numerical experiments, demonstrating faster convergence, and stronger validation metrics at given parameter budgets.

LGMay 30, 2023
Geometry-aware training of factorized layers in tensor Tucker format

Emanuele Zangrando, Steffen Schotthöfer, Gianluca Ceruti et al.

Reducing parameter redundancies in neural network architectures is crucial for achieving feasible computational and memory requirements during training and inference phases. Given its easy implementation and flexibility, one promising approach is layer factorization, which reshapes weight tensors into a matrix format and parameterizes them as the product of two small rank matrices. However, this approach typically requires an initial full-model warm-up phase, prior knowledge of a feasible rank, and it is sensitive to parameter initialization. In this work, we introduce a novel approach to train the factors of a Tucker decomposition of the weight tensors. Our training proposal proves to be optimal in locally approximating the original unfactorized dynamics independently of the initialization. Furthermore, the rank of each mode is dynamically updated during training. We provide a theoretical analysis of the algorithm, showing convergence, approximation and local descent guarantees. The method's performance is further illustrated through a variety of experiments, showing remarkable training compression rates and comparable or even better performance than the full baseline and alternative layer factorization strategies.

NAOct 2, 2018
Maximum-principle-satisfying second-order Intrusive Polynomial Moment scheme

Jonas Kusch, Graham W. Alldredge, Martin Frank

Using standard intrusive techniques when solving hyperbolic conservation laws with uncertainties can lead to oscillatory solutions as well as nonhyperbolic moment systems. The Intrusive Polynomial Moment (IPM) method ensures hyperbolicity of the moment system while restricting oscillatory over- and undershoots of specified bounds. In this contribution, we derive a second-order discretization of the IPM moment system which fulfills the maximum principle. This task is carried out by investigating violations of the specified bounds due to the errors from the numerical optimization required by the scheme. This analysis gives weaker conditions on the entropy that is used, allowing the choice of an entropy which enables choosing the exact minimal and maximal value of the initial condition as bounds. Solutions calculated with the derived scheme are nonoscillatory while fulfilling the maximum principle. The second-order accuracy of our scheme leads to significantly reduced numerical costs.

NAOct 2, 2018
Filtered Stochastic Galerkin Methods For Hyperbolic Equations

Jonas Kusch, Ryan G. McClarren, Martin Frank

Uncertainty Quantification for nonlinear hyperbolic problems becomes a challenging task in the vicinity of shocks. Standard intrusive methods lead to oscillatory solutions and can result in non-hyperbolic moment systems. The intrusive polynomial moment (IPM) method guarantees hyperbolicity but comes at higher numerical costs. In this paper, we filter the gPC coefficients of the Stochastic Galerkin (SG) approximation, which allows a numerically cheap reduction of oscillations. The derived filter is based on Lasso regression which sets small gPC coefficients of high order to zero. We adaptively choose the filter strength to obtain a zero-valued highest order moment, which allows optimality of the corresponding optimization problem. The filtered SG method is tested for Burgers' and the Euler equations. Results show a reduction of oscillations at shocks, which leads to an improved approximation of expectation values and the variance compared to SG and IPM.