Matthew West

LG
h-index34
16papers
138citations
Novelty49%
AI Score46

16 Papers

LGApr 6, 2023
Hierarchical Graph Neural Network with Cross-Attention for Cross-Device User Matching

Ali Taghibakhshi, Mingyuan Ma, Ashwath Aithal et al. · nvidia

Cross-device user matching is a critical problem in numerous domains, including advertising, recommender systems, and cybersecurity. It involves identifying and linking different devices belonging to the same person, utilizing sequence logs. Previous data mining techniques have struggled to address the long-range dependencies and higher-order connections between the logs. Recently, researchers have modeled this problem as a graph problem and proposed a two-tier graph contextual embedding (TGCE) neural network architecture, which outperforms previous methods. In this paper, we propose a novel hierarchical graph neural network architecture (HGNN), which has a more computationally efficient second level design than TGCE. Furthermore, we introduce a cross-attention (Cross-Att) mechanism in our model, which improves performance by 5% compared to the state-of-the-art TGCE method.

LGMay 19, 2022
Learning Interface Conditions in Domain Decomposition Solvers

Ali Taghibakhshi, Nicolas Nytko, Tareq Zaman et al.

Domain decomposition methods are widely used and effective in the approximation of solutions to partial differential equations. Yet the optimal construction of these methods requires tedious analysis and is often available only in simplified, structured-grid settings, limiting their use for more complex problems. In this work, we generalize optimized Schwarz domain decomposition methods to unstructured-grid problems, using Graph Convolutional Neural Networks (GCNNs) and unsupervised learning to learn optimal modifications at subdomain interfaces. A key ingredient in our approach is an improved loss function, enabling effective training on relatively small problems, but robust performance on arbitrarily large problems, with computational cost linear in problem size. The performance of the learned linear solvers is compared with both classical and optimized domain decomposition algorithms, for both structured- and unstructured-grid problems.

LGJan 26, 2023
MG-GNN: Multigrid Graph Neural Networks for Learning Multilevel Domain Decomposition Methods

Ali Taghibakhshi, Nicolas Nytko, Tareq Uz Zaman et al.

Domain decomposition methods (DDMs) are popular solvers for discretized systems of partial differential equations (PDEs), with one-level and multilevel variants. These solvers rely on several algorithmic and mathematical parameters, prescribing overlap, subdomain boundary conditions, and other properties of the DDM. While some work has been done on optimizing these parameters, it has mostly focused on the one-level setting or special cases such as structured-grid discretizations with regular subdomain construction. In this paper, we propose multigrid graph neural networks (MG-GNN), a novel GNN architecture for learning optimized parameters in two-level DDMs\@. We train MG-GNN using a new unsupervised loss function, enabling effective training on small problems that yields robust performance on unstructured grids that are orders of magnitude larger than those in the training set. We show that MG-GNN outperforms popular hierarchical graph network architectures for this optimization and that our proposed loss function is critical to achieving this improved performance.

AIMay 30, 2022Code
Truly Deterministic Policy Optimization

Ehsan Saleh, Saba Ghaffari, Timothy Bretl et al.

In this paper, we present a policy gradient method that avoids exploratory noise injection and performs policy search over the deterministic landscape. By avoiding noise injection all sources of estimation variance can be eliminated in systems with deterministic dynamics (up to the initial state distribution). Since deterministic policy regularization is impossible using traditional non-metric measures such as the KL divergence, we derive a Wasserstein-based quadratic model for our purposes. We state conditions on the system model under which it is possible to establish a monotonic policy improvement guarantee, propose a surrogate function for policy gradient estimation, and show that it is possible to compute exact advantage estimates if both the state transition model and the policy are deterministic. Finally, we describe two novel robotic control environments -- one with non-local rewards in the frequency domain and the other with a long horizon (8000 time-steps) -- for which our policy gradient method (TDPO) significantly outperforms existing methods (PPO, TRPO, DDPG, and TD3). Our implementation with all the experimental settings is available at https://github.com/ehsansaleh/code_tdpo

AIApr 8, 2022
Efficient Feedback and Partial Credit Grading for Proof Blocks Problems

Seth Poulsen, Shubhang Kulkarni, Geoffrey Herman et al.

Proof Blocks is a software tool that allows students to practice writing mathematical proofs by dragging and dropping lines instead of writing proofs from scratch. Proof Blocks offers the capability of assigning partial credit and providing solution quality feedback to students. This is done by computing the edit distance from a student's submission to some predefined set of solutions. In this work, we propose an algorithm for the edit distance problem that significantly outperforms the baseline procedure of exhaustively enumerating over the entire search space. Our algorithm relies on a reduction to the minimum vertex cover problem. We benchmark our algorithm on thousands of student submissions from multiple courses, showing that the baseline algorithm is intractable, and that our proposed algorithm is critical to enable classroom deployment. Our new algorithm has also been used for problems in many other domains where the solution space can be modeled as a DAG, including but not limited to Parsons Problems for writing code, helping students understand packet ordering in networking protocols, and helping students sketch solution steps for physics problems. Integrated into multiple learning management systems, the algorithm serves thousands of students each year.

CYMay 18
Automated Grading of Handwritten Mathematics Using Vision-Capable LLMs

Jacob Levine, Miguel Aenlle, Craig Zilles et al.

Automated grading systems have enabled scalable assessment for many response types, but handwritten mathematics remains a barrier due to the complexity of multi-step solutions. Vision-capable large language models (LLMs) offer new opportunities here, yet their reliability in authentic instructional settings remains poorly understood. We present an empirical evaluation of an LLM-based grader for handwritten mathematical work using instructor-defined rubrics. Extending a prior pipeline for typed responses, we integrate transcription and rubric-based evaluation of photographic submissions within a single LLM call, evaluating on student work from two university STEM courses. Comparing AI grading decisions against human-assigned ground truth at the rubric-item level, we observe high overall accuracy, with most errors -- 87\% in the best model -- attributable to transcription failures rather than rubric misapplication. We categorize common error modes, including image quality issues, hallucinated content, and incorrect handling of equivalent expressions. These findings highlight both the promise and limitations of LLM-based grading for handwritten mathematics, providing guidance for system design, prompt refinement, and deployment in educational settings.

LGMay 27, 2023Code
Learning from Integral Losses in Physics Informed Neural Networks

Ehsan Saleh, Saba Ghaffari, Timothy Bretl et al.

This work proposes a solution for the problem of training physics-informed networks under partial integro-differential equations. These equations require an infinite or a large number of neural evaluations to construct a single residual for training. As a result, accurate evaluation may be impractical, and we show that naive approximations at replacing these integrals with unbiased estimates lead to biased loss functions and solutions. To overcome this bias, we investigate three types of potential solutions: the deterministic sampling approaches, the double-sampling trick, and the delayed target method. We consider three classes of PDEs for benchmarking; one defining Poisson problems with singular charges and weak solutions of up to 10 dimensions, another involving weak solutions on electro-magnetic fields and a Maxwell equation, and a third one defining a Smoluchowski coagulation problem. Our numerical results confirm the existence of the aforementioned bias in practice and also show that our proposed delayed target approach can lead to accurate solutions with comparable quality to ones estimated with a large sample size integral. Our implementation is open-source and available at https://github.com/ehsansaleh/btspinn.

AO-PHOct 11, 2025
Generative Modeling of Aerosol State Representations

Ehsan Saleh, Saba Ghaffari, Jeffrey H. Curtis et al.

Aerosol-cloud--radiation interactions remain among the most uncertain components of the Earth's climate system, in partdue to the high dimensionality of aerosol state representations and the difficulty of obtaining complete \textit{in situ} measurements. Addressing these challenges requires methods that distill complex aerosol properties into compact yet physically meaningful forms. Generative autoencoder models provide such a pathway. We present a framework for learning deep variational autoencoder (VAE) models of speciated mass and number concentration distributions, which capture detailed aerosol size-composition characteristics. By compressing hundreds of original dimensions into ten latent variables, the approach enables efficient storage and processing while preserving the fidelity of key diagnostics, including cloud condensation nuclei (CCN) spectra, optical scattering and absorption coefficients, and ice nucleation properties. Results show that CCN spectra are easiest to reconstruct accurately, optical properties are moderately difficult, and ice nucleation properties are the most challenging. To improve performance, we introduce a preprocessing optimization strategy that avoids repeated retraining and yields latent representations resilient to high-magnitude Gaussian noise, boosting accuracy for CCN spectra, optical coefficients, and frozen fraction spectra. Finally, we propose a novel realism metric -- based on the sliced Wasserstein distance between generated samples and a held-out test set -- for optimizing the KL divergence weight in VAEs. Together, these contributions enable compact, robust, and physically meaningful representations of aerosol states for large-scale climate applications.

LGMar 30, 2024
Leveraging Pre-trained and Transformer-derived Embeddings from EHRs to Characterize Heterogeneity Across Alzheimer's Disease and Related Dementias

Matthew West, Colin Magdamo, Lily Cheng et al.

Alzheimer's disease is a progressive, debilitating neurodegenerative disease that affects 50 million people globally. Despite this substantial health burden, available treatments for the disease are limited and its fundamental causes remain poorly understood. Previous work has suggested the existence of clinically-meaningful sub-types, which it is suggested may correspond to distinct etiologies, disease courses, and ultimately appropriate treatments. Here, we use unsupervised learning techniques on electronic health records (EHRs) from a cohort of memory disorder patients to characterise heterogeneity in this disease population. Pre-trained embeddings for medical codes as well as transformer-derived Clinical BERT embeddings of free text are used to encode patient EHRs. We identify the existence of sub-populations on the basis of comorbidities and shared textual features, and discuss their clinical significance.

CYJun 7, 2021
Proof Blocks: Autogradable Scaffolding Activities for Learning to Write Proofs

Seth Poulsen, Mahesh Viswanathan, Geoffrey L. Herman et al.

Proof Blocks is a software tool which enables students to write proofs by dragging and dropping prewritten proof lines into the correct order. These proofs can be graded completely automatically, enabling students to receive rapid feedback on how they are doing with their proofs. When constructing a problem, the instructor specifies the dependency graph of the lines of the proof, so that any correct arrangement of the lines can receive full credit. This innovation can improve assessment tools by increasing the types of questions we can ask students about proofs, and can give greater access to proof knowledge by increasing the amount that students can learn on their own with the help of a computer.

LGJun 3, 2021
Optimization-Based Algebraic Multigrid Coarsening Using Reinforcement Learning

Ali Taghibakhshi, Scott MacLachlan, Luke Olson et al.

Large sparse linear systems of equations are ubiquitous in science and engineering, such as those arising from discretizations of partial differential equations. Algebraic multigrid (AMG) methods are one of the most common methods of solving such linear systems, with an extensive body of underlying mathematical theory. A system of linear equations defines a graph on the set of unknowns and each level of a multigrid solver requires the selection of an appropriate coarse graph along with restriction and interpolation operators that map to and from the coarse representation. The efficiency of the multigrid solver depends critically on this selection and many selection methods have been developed over the years. Recently, it has been demonstrated that it is possible to directly learn the AMG interpolation and restriction operators, given a coarse graph selection. In this paper, we consider the complementary problem of learning to coarsen graphs for a multigrid solver, a necessary step in developing fully learnable AMG methods. We propose a method using a reinforcement learning (RL) agent based on graph neural networks (GNNs), which can learn to perform graph coarsening on small planar training graphs and then be applied to unstructured large planar graphs, assuming bounded node degree. We demonstrate that this method can produce better coarse graphs than existing algorithms, even as the graph size increases and other properties of the graph are varied. We also propose an efficient inference procedure for performing graph coarsening that results in linear time complexity in graph size.

ROJan 15, 2021
Local Navigation and Docking of an Autonomous Robot Mower using Reinforcement Learning and Computer Vision

Ali Taghibakhshi, Nathan Ogden, Matthew West

We demonstrate a successful navigation and docking control system for the John Deere Tango autonomous mower, using only a single camera as the input. This vision-only system is of interest because it is inexpensive, simple for production, and requires no external sensing. This is in contrast to existing systems that rely on integrated position sensors and global positioning system (GPS) technologies. To produce our system we combined a state-of-the-art object detection architecture, You Only Look Once (YOLO), with a reinforcement learning (RL) architecture, Double Deep QNetworks (Double DQN). The object detection network identifies features on the mower and passes its output to the RL network, providing it with a low-dimensional representation that enables rapid and robust training. Finally, the RL network learns how to navigate the machine to the desired spot in a custom simulation environment. When tested on mower hardware, the system is able to dock with centimeter-level accuracy from arbitrary initial locations and orientations.

LGDec 6, 2020
Unsupervised Regionalization of Particle-resolved Aerosol Mixing State Indices on the Global Scale

Zhonghua Zheng, Joseph Ching, Jeffrey H. Curtis et al.

The aerosol mixing state significantly affects the climate and health impacts of atmospheric aerosol particles. Simplified aerosol mixing state assumptions, common in Earth System models, can introduce errors in the prediction of these aerosol impacts. The aerosol mixing state index, a metric to quantify aerosol mixing state, is a convenient measure for quantifying these errors. Global estimates of aerosol mixing state indices have recently become available via supervised learning models, but require regionalization to ease spatiotemporal analysis. Here we developed a simple but effective unsupervised learning approach to regionalize predictions of global aerosol mixing state indices. We used the monthly average of aerosol mixing state indices global distribution as the input data. Grid cells were then clustered into regions by the k-means algorithm without explicit spatial information as input. This approach resulted in eleven regions over the globe with specific spatial aggregation patterns. Each region exhibited a unique distribution of mixing state indices and aerosol compositions, showing the effectiveness of the unsupervised regionalization approach. This study defines "aerosol mixing state zones" that could be useful for atmospheric science research.

LGApr 1, 2020
Statistically Model Checking PCTL Specifications on Markov Decision Processes via Reinforcement Learning

Yu Wang, Nima Roohi, Matthew West et al.

Probabilistic Computation Tree Logic (PCTL) is frequently used to formally specify control objectives such as probabilistic reachability and safety. In this work, we focus on model checking PCTL specifications statistically on Markov Decision Processes (MDPs) by sampling, e.g., checking whether there exists a feasible policy such that the probability of reaching certain goal states is greater than a threshold. We use reinforcement learning to search for such a feasible policy for PCTL specifications, and then develop a statistical model checking (SMC) method with provable guarantees on its error. Specifically, we first use upper-confidence-bound (UCB) based Q-learning to design an SMC algorithm for bounded-time PCTL specifications, and then extend this algorithm to unbounded-time specifications by identifying a proper truncation time by checking the PCTL specification and its negation at the same time. Finally, we evaluate the proposed method on case studies.

OCAug 21, 2019
A tree-based radial basis function method for noisy parallel surrogate optimization

Chenchao Shou, Matthew West

Parallel surrogate optimization algorithms have proven to be efficient methods for solving expensive noisy optimization problems. In this work we develop a new parallel surrogate optimization algorithm (ProSRS), using a novel tree-based "zoom strategy" to improve the efficiency of the algorithm. We prove that if ProSRS is run for sufficiently long, with probability converging to one there will be at least one point among all the evaluations that will be arbitrarily close to the global minimum. We compare our algorithm to several state-of-the-art Bayesian optimization algorithms on a suite of standard benchmark functions and two real machine learning hyperparameter-tuning problems. We find that our algorithm not only achieves significantly faster optimization convergence, but is also 1-4 orders of magnitude cheaper in computational cost.

NAJan 5, 2006
Discrete Routh Reduction

Sameer M. Jalnapurkar, Melvin Leok, Jerrold E. Marsden et al.

This paper develops the theory of abelian Routh reduction for discrete mechanical systems and applies it to the variational integration of mechanical systems with abelian symmetry. The reduction of variational Runge-Kutta discretizations is considered, as well as the extent to which symmetry reduction and discretization commute. These reduced methods allow the direct simulation of dynamical features such as relative equilibria and relative periodic orbits that can be obscured or difficult to identify in the unreduced dynamics. The methods are demonstrated for the dynamics of an Earth orbiting satellite with a non-spherical $J_2$ correction, as well as the double spherical pendulum. The $J_2$ problem is interesting because in the unreduced picture, geometric phases inherent in the model and those due to numerical discretization can be hard to distinguish, but this issue does not appear in the reduced algorithm, where one can directly observe interesting dynamical structures in the reduced phase space (the cotangent bundle of shape space), in which the geometric phases have been removed. The main feature of the double spherical pendulum example is that it has a nontrivial magnetic term in its reduced symplectic form. Our method is still efficient as it can directly handle the essential non-canonical nature of the symplectic structure. In contrast, a traditional symplectic method for canonical systems could require repeated coordinate changes if one is evoking Darboux' theorem to transform the symplectic structure into canonical form, thereby incurring additional computational cost. Our method allows one to design reduced symplectic integrators in a natural way, despite the noncanonical nature of the symplectic structure.