LGMay 24, 2024
Learning from Linear Algebra: A Graph Neural Network Approach to Preconditioner Design for Conjugate Gradient SolversVladislav Trifonov, Alexander Rudikov, Oleg Iliev et al.
Large linear systems are ubiquitous in modern computational science and engineering. The main recipe for solving them is the use of Krylov subspace iterative methods with well-designed preconditioners. Recently, GNNs have been shown to be a promising tool for designing preconditioners to reduce the overall computational cost of iterative methods by constructing them more efficiently than with classical linear algebra techniques. Preconditioners designed with these approaches cannot outperform those designed with classical methods in terms of the number of iterations in CG. In our work, we recall well-established preconditioners from linear algebra and use them as a starting point for training the GNN to obtain preconditioners that reduce the condition number of the system more significantly than classical preconditioners. Numerical experiments show that our approach outperforms both classical and neural network-based methods for an important class of parametric partial differential equations. We also provide a heuristic justification for the loss function used and show that preconditioners obtained by learning with this loss function reduce the condition number in a more desirable way for CG.
LGApr 15, 2024
Quantization of Large Language Models with an Overdetermined BasisDaniil Merkulov, Daria Cherniuk, Alexander Rudikov et al.
In this paper, we introduce an algorithm for data quantization based on the principles of Kashin representation. This approach hinges on decomposing any given vector, matrix, or tensor into two factors. The first factor maintains a small infinity norm, while the second exhibits a similarly constrained norm when multiplied by an orthogonal matrix. Surprisingly, the entries of factors after decomposition are well-concentrated around several peaks, which allows us to efficiently replace them with corresponding centroids for quantization purposes. We study the theoretical properties of the proposed approach and rigorously evaluate our compression algorithm in the context of next-word prediction tasks and on a set of downstream tasks for text classification. Our findings demonstrate that Kashin Quantization achieves competitive or superior quality in model performance while ensuring data compression, marking a significant advancement in the field of data quantization.
LGSep 27, 2025
Deep Learning for Subspace RegressionVladimir Fanaskov, Vladislav Trifonov, Alexander Rudikov et al.
It is often possible to perform reduced order modelling by specifying linear subspace which accurately captures the dynamics of the system. This approach becomes especially appealing when linear subspace explicitly depends on parameters of the problem. A practical way to apply such a scheme is to compute subspaces for a selected set of parameters in the computationally demanding offline stage and in the online stage approximate subspace for unknown parameters by interpolation. For realistic problems the space of parameters is high dimensional, which renders classical interpolation strategies infeasible or unreliable. We propose to relax the interpolation problem to regression, introduce several loss functions suitable for subspace data, and use a neural network as an approximation to high-dimensional target function. To further simplify a learning problem we introduce redundancy: in place of predicting subspace of a given dimension we predict larger subspace. We show theoretically that this strategy decreases the complexity of the mapping for elliptic eigenproblems with constant coefficients and makes the mapping smoother for general smooth function on the Grassmann manifold. Empirical results also show that accuracy significantly improves when larger-than-needed subspaces are predicted. With the set of numerical illustrations we demonstrate that subspace regression can be useful for a range of tasks including parametric eigenproblems, deflation techniques, relaxation methods, optimal control and solution of parametric partial differential equations.
LGJun 7, 2024
ConDiff: A Challenging Dataset for Neural Solvers of Partial Differential EquationsVladislav Trifonov, Alexander Rudikov, Oleg Iliev et al.
We present ConDiff, a novel dataset for scientific machine learning. ConDiff focuses on the parametric diffusion equation with space dependent coefficients, a fundamental problem in many applications of partial differential equations (PDEs). The main novelty of the proposed dataset is that we consider discontinuous coefficients with high contrast. These coefficient functions are sampled from a selected set of distributions. This class of problems is not only of great academic interest, but is also the basis for describing various environmental and industrial problems. In this way, ConDiff shortens the gap with real-world problems while remaining fully synthetic and easy to use. ConDiff consists of a diverse set of diffusion equations with coefficients covering a wide range of contrast levels and heterogeneity with a measurable complexity metric for clearer comparison between different coefficient functions. We baseline ConDiff on standard deep learning models in the field of scientific machine learning. By providing a large number of problem instances, each with its own coefficient function and right-hand side, we hope to encourage the development of novel physics-based deep learning approaches, such as neural operators, ultimately driving progress towards more accurate and efficient solutions of complex PDE problems.
COMP-PHJun 4, 2024
Astral: training physics-informed neural networks with error majorantsVladimir Fanaskov, Tianchi Yu, Alexander Rudikov et al.
The primal approach to physics-informed learning is a residual minimization. We argue that residual is, at best, an indirect measure of the error of approximate solution and propose to train with error majorant instead. Since error majorant provides a direct upper bound on error, one can reliably estimate how close PiNN is to the exact solution and stop the optimization process when the desired accuracy is reached. We call loss function associated with error majorant \textbf{Astral}: neur\textbf{A}l a po\textbf{ST}erio\textbf{R}i function\textbf{A}l \textbf{L}oss. To compare Astral and residual loss functions, we illustrate how error majorants can be derived for various PDEs and conduct experiments with diffusion equations (including anisotropic and in the L-shaped domain), convection-diffusion equation, temporal discretization of Maxwell's equation, magnetostatics and nonlinear elastoplasticity problems. The results indicate that Astral loss is competitive to the residual loss, typically leading to faster convergence and lower error. The main benefit of using Astral loss comes from its ability to estimate error, which is impossible with other loss functions. Our experiments indicate that the error estimate obtained with Astral loss is usually tight enough, e.g., for a highly anisotropic equation, on average, Astral overestimates error by a factor of $1.5$, and for convection-diffusion by a factor of $1.7$. We further demonstrate that Astral loss is better correlated with error than residual and is a more reliable predictor of the error value. Moreover, unlike residual, the error indicator obtained from Astral loss has a superb spatial correlation with error. Backed with the empirical and theoretical results, we argue that one can productively use Astral loss to perform reliable error analysis and approximate PDE solutions with accuracy similar to standard residual-based techniques.