CHEM-PHDec 2, 2022Code
Gibbs-Helmholtz Graph Neural Network: capturing the temperature dependency of activity coefficients at infinite dilutionEdgar Ivan Sanchez Medina, Steffen Linke, Martin Stoll et al.
The accurate prediction of physicochemical properties of chemical compounds in mixtures (such as the activity coefficient at infinite dilution $γ_{ij}^\infty$) is essential for developing novel and more sustainable chemical processes. In this work, we analyze the performance of previously-proposed GNN-based models for the prediction of $γ_{ij}^\infty$, and compare them with several mechanistic models in a series of 9 isothermal studies. Moreover, we develop the Gibbs-Helmholtz Graph Neural Network (GH-GNN) model for predicting $\ln γ_{ij}^\infty$ of molecular systems at different temperatures. Our method combines the simplicity of a Gibbs-Helmholtz-derived expression with a series of graph neural networks that incorporate explicit molecular and intermolecular descriptors for capturing dispersion and hydrogen bonding effects. We have trained this model using experimentally determined $\ln γ_{ij}^\infty$ data of 40,219 binary-systems involving 1032 solutes and 866 solvents, overall showing superior performance compared to the popular UNIFAC-Dortmund model. We analyze the performance of GH-GNN for continuous and discrete inter/extrapolation and give indications for the model's applicability domain and expected accuracy. In general, GH-GNN is able to produce accurate predictions for extrapolated binary-systems if at least 25 systems with the same combination of solute-solvent chemical classes are contained in the training set and a similarity indicator above 0.35 is also present. This model and its applicability domain recommendations have been made open-source at https://github.com/edgarsmdn/GH-GNN.
NAOct 25, 2017
An inexact Newton-Krylov method for stochastic eigenvalue problemsPeter Benner, Akwum Onwunta, Martin Stoll
This paper aims at the efficient numerical solution of stochastic eigenvalue problems. Such problems often lead to prohibitively high dimensional systems with tensor product structure when discretized with the stochastic Galerkin method. Here, we exploit this inherent tensor product structure to develop a globalized low-rank inexact Newton method with which we tackle the stochastic eigenproblem. We illustrate the effectiveness of our solver with numerical experiments.
NANov 25, 2018
Efficient Numerical Methods for Gas Network Modeling and SimulationYue Qiu, Sara Grundel, Martin Stoll et al.
We study the modeling and simulation of gas pipeline networks, with a focus on fast numerical methods for the simulation of transient dynamics. The obtained mathematical model of the underlying network is represented by a nonlinear differential algebraic equation (DAE). With our modeling, we reduce the number of algebraic constraints, which correspond to the $(2,2)$ block in our semi-explicit DAE model, to the order of junction nodes in the network, where a junction node couples at least three pipelines. We can furthermore ensure that the $(1, 1)$ block of all system matrices including the Jacobian is block lower triangular by using a specific ordering of the pipes of the network. We then exploit this structure to propose an efficient preconditioner for the fast simulation of the network. We test our numerical methods on benchmark problems of (well-)known gas networks and the numerical results show the efficiency of our methods.
NANov 22, 2016
Preconditioning PDE-constrained optimization with $\rm L^1$-sparsity and control constraintsMargherita Porcelli, Valeria Simoncini, Martin Stoll
PDE-constrained optimization aims at finding optimal setups for partial differential equations so that relevant quantities are minimized. Including sparsity promoting terms in the formulation of such problems results in more practically relevant computed controls but adds more challenges to the numerical solution of these problems. The needed $\rm L^1$-terms as well as additional inclusion of box control constraints require the use of semismooth Newton methods. We propose robust preconditioners for different formulations of the Newton's equation. With the inclusion of a line-search strategy and an inexact approach for the solution of the linear systems, the resulting semismooth Newton's method is feasible for practical problems. Our results are underpinned by a theoretical analysis of the preconditioned matrix. Numerical experiments illustrate the robustness of the proposed scheme.
NAMar 17, 2017
Solving optimal control problems governed by random Navier-Stokes equations using low-rank methodsPeter Benner, Sergey Dolgov, Akwum Onwunta et al.
Many problems in computational science and engineering are simultaneously characterized by the following challenging issues: uncertainty, nonlinearity, nonstationarity and high dimensionality. Existing numerical techniques for such models would typically require considerable computational and storage resources. This is the case, for instance, for an optimization problem governed by time-dependent Navier-Stokes equations with uncertain inputs. In particular, the stochastic Galerkin finite element method often leads to a prohibitively high dimensional saddle-point system with tensor product structure. In this paper, we approximate the solution by the low-rank Tensor Train decomposition, and present a numerically efficient algorithm to solve the optimality equations directly in the low-rank representation. We show that the solution of the vorticity minimization problem with a distributed control admits a representation with ranks that depend modestly on model and discretization parameters even for high Reynolds numbers. For lower Reynolds numbers this is also the case for a boundary control. This opens the way for a reduced-order modeling of the stochastic optimal flow control with a moderate cost at all stages.
NAMar 16, 2017
Low-rank computation of posterior covariance matrices in Bayesian inverse problemsPeter Benner, Yue Qiu, Martin Stoll
We consider the problem of estimating the uncertainty in statistical inverse problems using Bayesian inference. When the probability density of the noise and the prior are Gaussian, the solution of such a statistical inverse problem is also Gaussian. Therefore, the underlying solution is characterized by the mean and covariance matrix of the posterior probability density. However, the covariance matrix of the posterior probability density is full and large. Hence, the computation of such a matrix is impossible for large dimensional parameter spaces. It is shown that for many ill-posed problems, the Hessian matrix of the data misfit part has low numerical rank and it is therefore possible to perform a low-rank approach to approximate the posterior covariance matrix. For such a low-rank approximation, one needs to solve a forward partial differential equation (PDE) and the adjoint PDE in both space and time. This in turn gives $\mathcal{O}(n_x n_t)$ complexity for both, computation and storage, where $n_x$ is the dimension of the spatial domain and $n_t$ is the dimension of the time domain. Such computations and storage demand are infeasible for large problems. To overcome this obstacle, we develop a new approach that utilizes a recently developed low-rank in time algorithm together with the low-rank Hessian method. We reduce both the computational complexity and storage requirement from $\mathcal{O}(n_x n_t)$ to $\mathcal{O}(n_x + n_t)$. We use numerical experiments to illustrate the advantages of our approach.
ROAug 10, 2023
The Integration of Prediction and Planning in Deep Learning Automated Driving Systems: A ReviewSteffen Hagedorn, Marcel Hallgarten, Martin Stoll et al.
Automated driving has the potential to revolutionize personal, public, and freight mobility. Beside accurately perceiving the environment, automated vehicles must plan a safe, comfortable, and efficient motion trajectory. To promote safety and progress, many works rely on modules that predict the future motion of surrounding traffic. Modular automated driving systems commonly handle prediction and planning as sequential, separate tasks. While this accounts for the influence of surrounding traffic on the ego vehicle, it fails to anticipate the reactions of traffic participants to the ego vehicle's behavior. Recent methods increasingly integrate prediction and planning in a joint or interdependent step to model bidirectional interactions. To date, a comprehensive overview of different integration principles is lacking. We systematically review state-of-the-art deep learning-based planning systems, and focus on how they integrate prediction. Different facets of the integration ranging from system architecture to high-level behavioral aspects are considered and related to each other. Moreover, we discuss the implications, strengths, and limitations of different integration principles. By pointing out research gaps, describing relevant future challenges, and highlighting trends in the research field, we identify promising directions for future research.
NANov 23, 2018
A low-rank tensor method for PDE-constrained optimization with isogeometric analysisAlexandra Bünger, Sergey Dolgov, Martin Stoll
Isogeometric analysis (IGA) has become one of the most popular methods for the discretization of partial differential equations motivated by the use of NURBS for geometric representations in industry and science. A crucial challenge lies in the solution of the discretized equations, which we discuss in this talk with a particular focus on PDE-constrained optimization discretized using IGA. The discretization results in a system of large mass and stiffness matrices, which are typically very costly to assemble. To reduce the computation time and storage requirements, low-rank tensor methods have become a promising tool. We present a framework for the assembly of these matrices in low-rank form as the sum of a small number of Kronecker products. For assembly of the smaller matrices only univariate integration is required. The resulting low rank Kronecker product structure of the mass and stiffness matrices can be used to solve a PDE-constrained optimization problem without assembling the actual system matrices. We present a framework which preserves and exploits the low-rank Kronecker product format for both the matrices and the solution. We use the block AMEn method to efficiently solve the corresponding KKT system of the optimization problem. We show several numerical experiments with 3D geometries to demonstrate that the low-rank assembly and solution drastically reduces the memory demands and computing times, depending on the approximation ranks of the domain.
NAFeb 15, 2017
Preconditioning of a coupled Cahn--Hilliard Navier--Stokes systemJessica Bosch, Christian Kahle, Martin Stoll
Recently, Garcke et al.[Garcke, Hinze, Kahle, A stable and linear time discretization for a thermodynamically consistent model for two-phase incompressible flow, Applied Numerical Mathematics 99, pp. 151-171, 2016] developed a consistent discretization scheme for a thermodynamically consistent diffuse interface model for incompressible two-phase flows with different densities. At the heart of this method lies the solution of large and sparse linear systems that arise in a semismooth Newton method. We propose the use of preconditioned Krylov subspace solvers using effective Schur complement approximations. Numerical results illustrate the efficiency of our approach. In particular, our preconditioner is shown to be robust with respect to parameter changes.
MLFeb 16, 2023
A weighted subspace exponential kernel for support tensor machinesKirandeep Kour, Sergey Dolgov, Peter Benner et al.
High-dimensional data in the form of tensors are challenging for kernel classification methods. To both reduce the computational complexity and extract informative features, kernels based on low-rank tensor decompositions have been proposed. However, what decisive features of the tensors are exploited by these kernels is often unclear. In this paper we propose a novel kernel that is based on the Tucker decomposition. For this kernel the Tucker factors are computed based on re-weighting of the Tucker matrices with tuneable powers of singular values from the HOSVD decomposition. This provides a mechanism to balance the contribution of the Tucker core and factors of the data. We benchmark support tensor machines with this new kernel on several datasets. First we generate synthetic data where two classes differ in either Tucker factors or core, and compare our novel and previously existing kernels. We show robustness of the new kernel with respect to both classification scenarios. We further test the new method on real-world datasets. The proposed kernel has demonstrated a higher test accuracy than the state-of-the-art tensor train multi-way multi-level kernel, and a significantly lower computational time.
NAJan 12, 2018
Fast iterative solvers for an optimal transport problemRoland Herzog, John W. Pearson, Martin Stoll
Optimal transport problems pose many challenges when considering their numerical treatment. We investigate the solution of a PDE-constrained optimisation problem subject to a particular transport equation arising from the modelling of image metamorphosis. We present the nonlinear optimisation problem, and discuss the discretisation and treatment of the nonlinearity via a Gauss--Newton scheme. We then derive preconditioners that can be used to solve the linear systems at the heart of the (Gauss--)Newton method. With the optical flow in mind, we further propose the reduction of dimensionality by choosing a radial basis function discretisation that uses the centres of superpixels as the collocation points. Again, we derive suitable preconditioners that can be used for this formulation.
MLMay 15, 2022
A comparison of PINN approaches for drift-diffusion equations on metric graphsJan Blechschmidt, Jan-Frederik Pietschman, Tom-Christian Riemer et al.
In this paper we focus on comparing machine learning approaches for quantum graphs, which are metric graphs, i.e., graphs with dedicated edge lengths, and an associated differential operator. In our case the differential equation is a drift-diffusion model. Computational methods for quantum graphs require a careful discretization of the differential operator that also incorporates the node conditions, in our case Kirchhoff-Neumann conditions. Traditional numerical schemes are rather mature but have to be tailored manually when the differential equation becomes the constraint in an optimization problem. Recently, physics informed neural networks (PINNs) have emerged as a versatile tool for the solution of partial differential equations from a range of applications. They offer flexibility to solve parameter identification or optimization problems by only slightly changing the problem formulation used for the forward simulation. We compare several PINN approaches for solving the drift-diffusion on the metric graph.
OCFeb 11, 2019
Interior Point Methods and Preconditioning for PDE-Constrained Optimization Problems Involving Sparsity TermsJohn W. Pearson, Margherita Porcelli, Martin Stoll
PDE-constrained optimization problems with control or state constraints are challenging from an analytical as well as numerical perspective. The combination of these constraints with a sparsity-promoting $\rm L^1$ term within the objective function requires sophisticated optimization methods. We propose the use of an Interior Point scheme applied to a smoothed reformulation of the discretized problem, and illustrate that such a scheme exhibits robust performance with respect to parameter changes. To increase the potency of this method we introduce fast and efficient preconditioners which enable us to solve problems from a number of PDE applications in low iteration numbers and CPU times, even when the parameters involved are altered dramatically.
NAJun 24, 2016
A Fast Multipole Method based on Band-limited Approximations for Radial Basis FunctionsWei Zhao, Martin Stoll
The meshless/meshfree radial basis function (RBF) method is a powerful technique for interpolating scattered data. But, solving large RBF interpolation problems without fast summation methods is computationally expensive. For RBF interpolation with $N$ points, using a direct method requires $\mathcal{O}(N^2)$ operations. As a fast summation method, the fast multipole method (FMM) has been implemented in speeding up the matrix-vector multiply, which reduces the complexity from $\mathcal{O}(N^2)$ to $\mathcal{O}(N^{1.5})$ and even to $\mathcal{O}(NlogN)$ for the multilevel fast multipole method (MLFMM). In this paper, we present a novel kernel-independent fast multipole method for RBF interpolation, which is used in combination with the evaluation of point-to-point interactions by RBF and the fast matrix-vector multiplication. This approach is based on band-limited approximation and quadrature rules, which extends the range of applicability of FMM.
32.9LGApr 3
Super Agents and Confounders: Influence of surrounding agents on vehicle trajectory predictionDaniel Jost, Luca Paparusso, Martin Stoll et al.
In highly interactive driving scenes, trajectory prediction is conditioned on information from surrounding traffic participants such as cars and pedestrians. Our main contribution is a comprehensive analysis of state-of-the-art trajectory predictors, which reveals a surprising and critical flaw: many surrounding agents degrade prediction accuracy rather than improve it. Using Shapley-based attribution, we rigorously demonstrate that models learn unstable and non-causal decision-making schemes that vary significantly across training runs. Building on these insights, we propose to integrate a Conditional Information Bottleneck (CIB), which does not require additional supervision and is trained to effectively compress agent features as well as ignore those that are not beneficial for the prediction task. Comprehensive experiments using multiple datasets and model architectures demonstrate that this simple yet effective approach not only improves overall trajectory prediction performance in many cases but also increases robustness to different perturbations. Our results highlight the importance of selectively integrating contextual information, which can often contain spurious or misleading signals, in trajectory prediction. Moreover, we provide interpretable metrics for identifying non-robust behavior and present a promising avenue towards a solution.
LGApr 26, 2024Code
Fast Evaluation of Additive Kernels: Feature Arrangement, Fourier Methods, and Kernel DerivativesTheresa Wagner, Franziska Nestler, Martin Stoll
One of the main computational bottlenecks when working with kernel based learning is dealing with the large and typically dense kernel matrix. Techniques dealing with fast approximations of the matrix vector product for these kernel matrices typically deteriorate in their performance if the feature vectors reside in higher-dimensional feature spaces. We here present a technique based on the non-equispaced fast Fourier transform (NFFT) with rigorous error analysis. We show that this approach is also well suited to allow the approximation of the matrix that arises when the kernel is differentiated with respect to the kernel hyperparameters; a problem often found in the training phase of methods such as Gaussian processes. We also provide an error analysis for this case. We illustrate the performance of the additive kernel scheme with fast matrix vector products on a number of data sets. Our code is available at https://github.com/wagnertheresa/NFFTAddKer
ROApr 11, 2024
Can Vehicle Motion Planning Generalize to Realistic Long-tail Scenarios?Marcel Hallgarten, Julian Zapata, Martin Stoll et al.
Real-world autonomous driving systems must make safe decisions in the face of rare and diverse traffic scenarios. Current state-of-the-art planners are mostly evaluated on real-world datasets like nuScenes (open-loop) or nuPlan (closed-loop). In particular, nuPlan seems to be an expressive evaluation method since it is based on real-world data and closed-loop, yet it mostly covers basic driving scenarios. This makes it difficult to judge a planner's capabilities to generalize to rarely-seen situations. Therefore, we propose a novel closed-loop benchmark interPlan containing several edge cases and challenging driving scenarios. We assess existing state-of-the-art planners on our benchmark and show that neither rule-based nor learning-based planners can safely navigate the interPlan scenarios. A recently evolving direction is the usage of foundation models like large language models (LLM) to handle generalization. We evaluate an LLM-only planner and introduce a novel hybrid planner that combines an LLM-based behavior planner with a rule-based motion planner that achieves state-of-the-art performance on our benchmark.
LGApr 1, 2025
Preconditioned Additive Gaussian Processes with Fourier AccelerationTheresa Wagner, Tianshi Xu, Franziska Nestler et al.
Gaussian processes (GPs) are crucial in machine learning for quantifying uncertainty in predictions. However, their associated covariance matrices, defined by kernel functions, are typically dense and large-scale, posing significant computational challenges. This paper introduces a matrix-free method that utilizes the Non-equispaced Fast Fourier Transform (NFFT) to achieve nearly linear complexity in the multiplication of kernel matrices and their derivatives with vectors for a predetermined accuracy level. To address high-dimensional problems, we propose an additive kernel approach. Each sub-kernel in this approach captures lower-order feature interactions, allowing for the efficient application of the NFFT method and potentially increasing accuracy across various real-world datasets. Additionally, we implement a preconditioning strategy that accelerates hyperparameter tuning, further improving the efficiency and effectiveness of GPs.
LGOct 6, 2025
KEEP: Integrating Medical Ontologies with Clinical Data for Robust Code EmbeddingsAhmed Elhussein, Paul Meddeb, Abigail Newbury et al.
Machine learning in healthcare requires effective representation of structured medical codes, but current methods face a trade off: knowledge graph based approaches capture formal relationships but miss real world patterns, while data driven methods learn empirical associations but often overlook structured knowledge in medical terminologies. We present KEEP (Knowledge preserving and Empirically refined Embedding Process), an efficient framework that bridges this gap by combining knowledge graph embeddings with adaptive learning from clinical data. KEEP first generates embeddings from knowledge graphs, then employs regularized training on patient records to adaptively integrate empirical patterns while preserving ontological relationships. Importantly, KEEP produces final embeddings without task specific auxiliary or end to end training enabling KEEP to support multiple downstream applications and model architectures. Evaluations on structured EHR from UK Biobank and MIMIC IV demonstrate that KEEP outperforms both traditional and Language Model based approaches in capturing semantic relationships and predicting clinical outcomes. Moreover, KEEP's minimal computational requirements make it particularly suitable for resource constrained environments.
LGMay 7, 2025
Physics-Informed DeepONets for drift-diffusion on metric graphs: simulation and parameter identificationJan Blechschmidt, Tom-Christian Riemer, Max Winkler et al.
We develop a novel physics informed deep learning approach for solving nonlinear drift-diffusion equations on metric graphs. These models represent an important model class with a large number of applications in areas ranging from transport in biological cells to the motion of human crowds. While traditional numerical schemes require a large amount of tailoring, especially in the case of model design or parameter identification problems, physics informed deep operator networks (DeepONet) have emerged as a versatile tool for the solution of partial differential equations with the particular advantage that they easily incorporate parameter identification questions. We here present an approach where we first learn three DeepONet models for representative inflow, inner and outflow edges, resp., and then subsequently couple these models for the solution of the drift-diffusion metric graph problem by relying on an edge-based domain decomposition approach. We illustrate that our framework is applicable for the accurate evaluation of graph-coupled physics models and is well suited for solving optimization or inverse problems on these coupled networks.
NAApr 30, 2025
Low-rank computation of the posterior mean in Multi-Output Gaussian ProcessesSebastian Esche, Martin Stoll
Gaussian processes (GP) are a versatile tool in machine learning and computational science. We here consider the case of multi-output Gaussian processes (MOGP) and present low-rank approaches for efficiently computing the posterior mean of a MOGP. Starting from low-rank spatio-temporal data we consider a structured covariance function, assuming separability across space and time. This separability, in turn, gives a decomposition of the covariance matrix into a Kronecker product of individual covariance matrices. Incorporating the typical noise term to the model then requires the solution of a large-scale Stein equation for computing the posterior mean. For this, we propose efficient low-rank methods based on a combination of a LRPCG method with the Sylvester equation solver KPIK adjusted for solving Stein equations. We test the developed method on real world street network graphs by using graph filters as covariance matrices. Moreover, we propose a degree-weighted average covariance matrix, which can be employed under specific assumptions to achieve more efficient convergence.
LGNov 19, 2021
Learning in High-Dimensional Feature Spaces Using ANOVA-Based Fast Matrix-Vector MultiplicationFranziska Nestler, Martin Stoll, Theresa Wagner
Kernel matrices are crucial in many learning tasks such as support vector machines or kernel ridge regression. The kernel matrix is typically dense and large-scale. Depending on the dimension of the feature space even the computation of all of its entries in reasonable time becomes a challenging task. For such dense matrices the cost of a matrix-vector product scales quadratically with the dimensionality N , if no customized methods are applied. We propose the use of an ANOVA kernel, where we construct several kernels based on lower-dimensional feature spaces for which we provide fast algorithms realizing the matrix-vector products. We employ the non-equispaced fast Fourier transform (NFFT), which is of linear complexity for fixed accuracy. Based on a feature grouping approach, we then show how the fast matrix-vector products can be embedded into a learning method choosing kernel ridge regression and the conjugate gradient solver. We illustrate the performance of our approach on several data sets.
MLSep 15, 2021
RaWaNet: Enriching Graph Neural Network Input via Random Walks on GraphsAnahita Iravanizad, Edgar Ivan Sanchez Medina, Martin Stoll
In recent years, graph neural networks (GNNs) have gained increasing popularity and have shown very promising results for data that are represented by graphs. The majority of GNN architectures are designed based on developing new convolutional and/or pooling layers that better extract the hidden and deeper representations of the graphs to be used for different prediction tasks. The inputs to these layers are mainly the three default descriptors of a graph, node features $(X)$, adjacency matrix $(A)$, and edge features $(W)$ (if available). To provide a more enriched input to the network, we propose a random walk data processing of the graphs based on three selected lengths. Namely, (regular) walks of length 1 and 2, and a fractional walk of length $γ\in (0,1)$, in order to capture the different local and global dynamics on the graphs. We also calculate the stationary distribution of each random walk, which is then used as a scaling factor for the initial node features ($X$). This way, for each graph, the network receives multiple adjacency matrices along with their individual weighting for the node features. We test our method on various molecular datasets by passing the processed node features to the network in order to perform several classification and regression tasks. Interestingly, our method, not using edge features which are heavily exploited in molecular graph learning, let a shallow network outperform well known deep GNNs.
LGApr 16, 2021
An Empirical Study of Graph-Based Approaches for Semi-Supervised Time Series ClassificationDominik Alfke, Miriam Gondos, Lucile Peroche et al.
Time series data play an important role in many applications and their analysis reveals crucial information for understanding the underlying processes. Among the many time series learning tasks of great importance, we here focus on semi-supervised learning based on a graph representation of the data. Two main aspects are involved in this task. A suitable distance measure to evaluate the similarities between time series, and a learning method to make predictions based on these distances. However, the relationship between the two aspects has never been studied systematically in the context of graph-based learning. We describe four different distance measures, including (Soft) DTW and MPDist, a distance measure based on the Matrix Profile, as well as four successful semi-supervised learning methods, including the graph Allen--Cahn method and a Graph Convolutional Neural Network. We then compare the performance of the algorithms on binary classification data sets. In our findings we compare the chosen graph-based methods using all distance measures and observe that the results vary strongly with respect to the accuracy. As predicted by the ``no free lunch'' theorem, no clear best combination to employ in all cases is found. Our study provides a reproducible framework for future work in the direction of semi-supervised learning for time series with a focus on graph representations.
LGAug 3, 2020
Pseudoinverse Graph Convolutional Networks: Fast Filters Tailored for Large Eigengaps of Dense Graphs and HypergraphsDominik Alfke, Martin Stoll
Graph Convolutional Networks (GCNs) have proven to be successful tools for semi-supervised classification on graph-based datasets. We propose a new GCN variant whose three-part filter space is targeted at dense graphs. Examples include Gaussian graphs for 3D point clouds with an increased focus on non-local information, as well as hypergraphs based on categorical data. These graphs differ from the common sparse benchmark graphs in terms of the spectral properties of their graph Laplacian. Most notably we observe large eigengaps, which are unfavorable for popular existing GCN architectures. Our method overcomes these issues by utilizing the pseudoinverse of the Laplacian. Another key ingredient is a low-rank approximation of the convolutional matrix, ensuring computational efficiency and increasing accuracy at the same time. We outline how the necessary eigeninformation can be computed efficiently in each applications and discuss the appropriate choice of the only metaparameter, the approximation rank. We finally showcase our method's performance regarding runtime and accuracy in various experiments with real-world datasets.
LGJul 10, 2020
Semi-supervised Learning for Aggregated Multilayer Graphs Using Diffuse Interface Methods and Fast Matrix Vector ProductsKai Bergermann, Martin Stoll, Toni Volkmer
We generalize a graph-based multiclass semi-supervised classification technique based on diffuse interface methods to multilayer graphs. Besides the treatment of various applications with an inherent multilayer structure, we present a very flexible approach that interprets high-dimensional data in a low-dimensional multilayer graph representation. Highly efficient numerical methods involving the spectral decomposition of the corresponding differential graph operators as well as fast matrix-vector products based on the nonequispaced fast Fourier transform (NFFT) enable the rapid treatment of large and high-dimensional data sets. We perform various numerical tests putting a special focus on image segmentation. In particular, we test the performance of our method on data sets with up to 10 million nodes per layer as well as up to 104 dimensions resulting in graphs with up to 52 layers. While all presented numerical experiments can be run on an average laptop computer, the linear dependence per iteration step of the runtime on the network size in all stages of our algorithm makes it scalable to even larger and higher-dimensional problems.
LGFeb 12, 2020
Efficient Structure-preserving Support Tensor Train MachineKirandeep Kour, Sergey Dolgov, Martin Stoll et al.
An increasing amount of collected data are high-dimensional multi-way arrays (tensors), and it is crucial for efficient learning algorithms to exploit this tensorial structure as much as possible. The ever-present curse of dimensionality for high dimensional data and the loss of structure when vectorizing the data motivates the use of tailored low-rank tensor classification methods. In the presence of small amounts of training data, kernel methods offer an attractive choice as they provide the possibility for a nonlinear decision boundary. We develop the Tensor Train Multi-way Multi-level Kernel (TT-MMK), which combines the simplicity of the Canonical Polyadic decomposition, the classification power of the Dual Structure-preserving Support Vector Machine, and the reliability of the Tensor Train (TT) approximation. We show by experiments that the TT-MMK method is usually more reliable computationally, less sensitive to tuning parameters, and gives higher prediction accuracy in the SVM classification when benchmarked against other state-of-the-art techniques.
NADec 17, 2019
A literature survey of matrix methods for data scienceMartin Stoll
Efficient numerical linear algebra is a core ingredient in many applications across almost all scientific and industrial disciplines. With this survey we want to illustrate that numerical linear algebra has played and is playing a crucial role in enabling and improving data science computations with many new developments being fueled by the availability of data and computing resources. We highlight the role of various different factorizations and the power of changing the representation of the data as well as discussing topics such as randomized algorithms, functions of matrices, and high-dimensional problems. We briefly touch upon the role of techniques from numerical linear algebra used within deep learning.
LGMay 24, 2019
Semi-Supervised Classification on Non-Sparse Graphs Using Low-Rank Graph Convolutional NetworksDominik Alfke, Martin Stoll
Graph Convolutional Networks (GCNs) have proven to be successful tools for semi-supervised learning on graph-based datasets. For sparse graphs, linear and polynomial filter functions have yielded impressive results. For large non-sparse graphs, however, network training and evaluation becomes prohibitively expensive. By introducing low-rank filters, we gain significant runtime acceleration and simultaneously improved accuracy. We further propose an architecture change mimicking techniques from Model Order Reduction in what we call a reduced-order GCN. Moreover, we present how our method can also be applied to hypergraph datasets and how hypergraph convolution can be implemented efficiently.
SISep 7, 2018
Node Classification for Signed Social Networks Using Diffuse Interface MethodsPedro Mercado, Jessica Bosch, Martin Stoll
Signed networks contain both positive and negative kinds of interactions like friendship and enmity. The task of node classification in non-signed graphs has proven to be beneficial in many real world applications, yet extensions to signed networks remain largely unexplored. In this paper we introduce the first analysis of node classification in signed social networks via diffuse interface methods based on the Ginzburg-Landau functional together with different extensions of the graph Laplacian to signed networks. We show that blending the information from both positive and negative interactions leads to performance improvement in real signed social networks, consistently outperforming the current state of the art.
LGAug 14, 2018
NFFT meets Krylov methods: Fast matrix-vector products for the graph Laplacian of fully connected networksDominik Alfke, Daniel Potts, Martin Stoll et al.
The graph Laplacian is a standard tool in data science, machine learning, and image processing. The corresponding matrix inherits the complex structure of the underlying network and is in certain applications densely populated. This makes computations, in particular matrix-vector products, with the graph Laplacian a hard task. A typical application is the computation of a number of its eigenvalues and eigenvectors. Standard methods become infeasible as the number of nodes in the graph is too large. We propose the use of the fast summation based on the nonequispaced fast Fourier transform (NFFT) to perform the dense matrix-vector product with the graph Laplacian fast without ever forming the whole matrix. The enormous flexibility of the NFFT algorithm allows us to embed the accelerated multiplication into Lanczos-based eigenvalues routines or iterative linear system solvers and even consider other than the standard Gaussian kernels. We illustrate the feasibility of our approach on a number of test problems from image segmentation to semi-supervised learning based on graph-based PDEs. In particular, we compare our approach with the Nyström method. Moreover, we present and test an enhanced, hybrid version of the Nyström method, which internally uses the NFFT.
MLNov 18, 2016
Generalizing diffuse interface methods on graphs: non-smooth potentials and hypergraphsJessica Bosch, Steffen Klamt, Martin Stoll
Diffuse interface methods have recently been introduced for the task of semi-supervised learning. The underlying model is well-known in materials science but was extended to graphs using a Ginzburg--Landau functional and the graph Laplacian. We here generalize the previously proposed model by a non-smooth potential function. Additionally, we show that the diffuse interface method can be used for the segmentation of data coming from hypergraphs. For this we show that the graph Laplacian in almost all cases is derived from hypergraph information. Additionally, we show that the formerly introduced hypergraph Laplacian coming from a relaxed optimization problem is well suited to be used within the diffuse interface method. We present computational experiments for graph and hypergraph Laplacians.