85.4AIMay 28
Physically Viable World Models: A Case for Query-Conditioned Embodied AIAdam J. Thorpe, Stepan Tretiakov, Cheng-Hsi Hsiao et al.
World models for embodied AI must be physically viable: constructed to answer intervention queries by representing the physical structure governing action outcomes, rather than merely predicting future observations. Existing observation-predictive world models can produce visually plausible but physically wrong rollouts. This failure is structural; distinct physical systems can look identical yet diverge under intervention. We expose this problem with controlled benchmarks that fix the visible scene while varying latent physics. We show that such models may recommend infeasible actions, mispredict interaction outcomes, or certify unsafe behavior. We argue that embodied AI requires world models that identify the simplest physical abstraction sufficient to answer an intervention query. Such a model comprises modular components, including environment representation, latent state and parameter estimation, action specification, interventional dynamics, and query-level response. An autonomous orchestrator should identify the relevant abstraction and compose compatible learned and structured components per query. When closed-form physics is unavailable, uncertain, or costly, the transition model may be analytic, simulated, learned, or hybrid, but it must preserve the structure that determines interventional outcomes. This decomposition makes the model interpretable, its components verifiable, and its outputs auditable against the query. It also provides a design principle for new world models and a feasibility test for existing ones: the right abstraction is not the most detailed model of the world, but the simplest model that preserves the distinctions relevant to the query. We demonstrate this approach on queries that existing systems fail to answer correctly, and outline how an orchestrator can dynamically assemble and adapt physically viable models for planning, control, and verification.
LGNov 18, 2022Code
GNS: A generalizable Graph Neural Network-based simulator for particulate and fluid modelingKrishna Kumar, Joseph Vantassel
We develop a PyTorch-based Graph Network Simulator (GNS) that learns physics and predicts the flow behavior of particulate and fluid systems. GNS discretizes the domain with nodes representing a collection of material points and the links connecting the nodes representing the local interaction between particles or clusters of particles. The GNS learns the interaction laws through message passing on the graph. GNS has three components: (a) Encoder, which embeds particle information to a latent graph, the edges are learned functions; (b) Processor, which allows data propagation and computes the nodal interactions across steps; and (c) Decoder, which extracts the relevant dynamics (e.g., particle acceleration) from the graph. We introduce physics-inspired simple inductive biases, such as an inertial frame that allows learning algorithms to prioritize one solution (constant gravitational acceleration) over another, reducing learning time. The GNS implementation uses semi-implicit Euler integration to update the next state based on the predicted accelerations. GNS trained on trajectory data is generalizable to predict particle kinematics in complex boundary conditions not seen during training. The trained model accurately predicts within a 5\% error of its associated material point method (MPM) simulation. The predictions are 5,000x faster than traditional MPM simulations (2.5 hours for MPM simulations versus 20 s for GNS simulation of granular flow). GNS surrogates are popular for solving optimization, control, critical-region prediction for in situ viz, and inverse-type problems. The GNS code is available under the open-source MIT license at https://github.com/geoelements/gns.
LGSep 30, 2024
Basis-to-Basis Operator Learning Using Function EncodersTyler Ingebrand, Adam J. Thorpe, Somdatta Goswami et al.
We present Basis-to-Basis (B2B) operator learning, a novel approach for learning operators on Hilbert spaces of functions based on the foundational ideas of function encoders. We decompose the task of learning operators into two parts: learning sets of basis functions for both the input and output spaces and learning a potentially nonlinear mapping between the coefficients of the basis functions. B2B operator learning circumvents many challenges of prior works, such as requiring data to be at fixed locations, by leveraging classic techniques such as least squares to compute the coefficients. It is especially potent for linear operators, where we compute a mapping between bases as a single matrix transformation with a closed-form solution. Furthermore, with minimal modifications and using the deep theoretical connections between function encoders and functional analysis, we derive operator learning algorithms that are directly analogous to eigen-decomposition and singular value decomposition. We empirically validate B2B operator learning on seven benchmark operator learning tasks and show that it demonstrates a two-orders-of-magnitude improvement in accuracy over existing approaches on several benchmark tasks.
LGJul 19, 2022
A Frequency-Velocity CNN for Developing Near-Surface 2D Vs Images from Linear-Array, Active-Source Wavefield MeasurementsAser Abbas, Joseph P. Vantassel, Brady R. Cox et al.
This paper presents a frequency-velocity convolutional neural network (CNN) for rapid, non-invasive 2D shear wave velocity (Vs) imaging of near-surface geo-materials. Operating in the frequency-velocity domain allows for significant flexibility in the linear-array, active-source experimental testing configurations used for generating the CNN input, which are normalized dispersion images. Unlike wavefield images, normalized dispersion images are relatively insensitive to the experimental testing configuration, accommodating various source types, source offsets, numbers of receivers, and receiver spacings. We demonstrate the effectiveness of the frequency-velocity CNN by applying it to a classic near-surface geophysics problem, namely, imaging a two-layer, undulating, soil-over-bedrock interface. This problem was recently investigated in our group by developing a time-distance CNN, which showed great promise but lacked flexibility in utilizing different field-testing configurations. Herein, the new frequency-velocity CNN is shown to have comparable accuracy to the time-distance CNN while providing greater flexibility to handle varied field applications. The frequency-velocity CNN was trained, validated, and tested using 100,000 synthetic near-surface models. The ability of the proposed frequency-velocity CNN to generalize across various acquisition configurations is first tested using synthetic near-surface models with different acquisition configurations from that of the training set, and then applied to experimental field data collected at the Hornsby Bend site in Austin, Texas, USA. When fully developed for a wider range of geological conditions, the proposed CNN may ultimately be used as a rapid, end-to-end alternative for current pseudo-2D surface wave imaging techniques or to develop starting models for full waveform inversion.
LGNov 16, 2022
Using explainability to design physics-aware CNNs for solving subsurface inverse problemsJodie Crocker, Krishna Kumar, Brady R. Cox
We present a novel method of using explainability techniques to design physics-aware neural networks. We demonstrate our approach by developing a convolutional neural network (CNN) for solving an inverse problem for shallow subsurface imaging. Although CNNs have gained popularity in recent years across many fields, the development of CNNs remains an art, as there are no clear guidelines regarding the selection of hyperparameters that will yield the best network. While optimization algorithms may be used to select hyperparameters automatically, these methods focus on developing networks with high predictive accuracy while disregarding model explainability (descriptive accuracy). However, the field of Explainable Artificial Intelligence (XAI) addresses the absence of model explainability by providing tools that allow developers to evaluate the internal logic of neural networks. In this study, we use the explainability methods Score-CAM and Deep SHAP to select hyperparameters, such as kernel sizes and network depth, to develop a physics-aware CNN for shallow subsurface imaging. We begin with a relatively deep Encoder-Decoder network, which uses surface wave dispersion images as inputs and generates 2D shear wave velocity subsurface images as outputs. Through model explanations, we ultimately find that a shallow CNN using two convolutional layers with an atypical kernel size of 3x1 yields comparable predictive accuracy but with increased descriptive accuracy. We also show that explainability methods can be used to evaluate the network's complexity and decision-making. We believe this method can be used to develop neural networks with high predictive accuracy while also providing inherent explainability.
GEO-PHDec 6, 2022
Quantification of geogrid lateral restraint using transparent sand and deep learning-based image segmentationDavid Marx, Krishna Kumar, Jorge Zornberg
An experimental technique is presented to quantify the lateral restraint provided by a geogrid embedded in granular soil at the particle level. Repeated load triaxial tests were done on transparent sand specimens with geosynthetic inclusions simulating geogrids. Particle outlines on laser illuminated planes through the specimens were segmented using a deep learning-based segmentation algorithm. The particle outlines were characterized in terms of Fourier shape descriptors and tracked across sequentially captured images. The accuracy of the particle displacement measurements was validated against Digital Image Correlation (DIC) measurements. In addition, the method's resolution and repeatability is presented. Based on the measured particle displacements and rotations, a state boundary line between probable and improbable particle motions was identified for each test. The size of the zone of probable motions could be used to quantify the lateral restraint provided by the inclusions. Overall, the tests results revealed that the geosynthetic inclusions restricted both particle displacements and rotations. However, the particle displacements were found to be restrained more significantly than the rotations. Finally, a unique relationship was found between the magnitude of the permanent strains of the specimens and the size of the zone of probable motions.
LGDec 19, 2025
Learning Generalizable Neural Operators for Inverse ProblemsAdam J. Thorpe, Stepan Tretiakov, Dibakar Roy Sarkar et al.
Inverse problems challenge existing neural operator architectures because ill-posed inverse maps violate continuity, uniqueness, and stability assumptions. We introduce B2B${}^{-1}$, an inverse basis-to-basis neural operator framework that addresses this limitation. Our key innovation is to decouple function representation from the inverse map. We learn neural basis functions for the input and output spaces, then train inverse models that operate on the resulting coefficient space. This structure allows us to learn deterministic, invertible, and probabilistic models within a single framework, and to choose models based on the degree of ill-posedness. We evaluate our approach on six inverse PDE benchmarks, including two novel datasets, and compare against existing invertible neural operator baselines. We learn probabilistic models that capture uncertainty and input variability, and remain robust to measurement noise due to implicit denoising in the coefficient calculation. Our results show consistent re-simulation performance across varying levels of ill-posedness. By separating representation from inversion, our framework enables scalable surrogate models for inverse problems that generalize across instances, domains, and degrees of ill-posedness.
SYNov 7, 2025
Zero-Shot Function Encoder-Based Differentiable Predictive ControlHassan Iqbal, Xingjian Li, Tyler Ingebrand et al.
We introduce a differentiable framework for zero-shot adaptive control over parametric families of nonlinear dynamical systems. Our approach integrates a function encoder-based neural ODE (FE-NODE) for modeling system dynamics with a differentiable predictive control (DPC) for offline self-supervised learning of explicit control policies. The FE-NODE captures nonlinear behaviors in state transitions and enables zero-shot adaptation to new systems without retraining, while the DPC efficiently learns control policies across system parameterizations, thus eliminating costly online optimization common in classical model predictive control. We demonstrate the efficiency, accuracy, and online adaptability of the proposed method across a range of nonlinear systems with varying parametric scenarios, highlighting its potential as a general-purpose tool for fast zero-shot adaptive control.
50.3LGApr 3
Neural Operators for Multi-Task Control and AdaptationDavid Sewell, Xingjian Li, Stepan Tretiakov et al.
Neural operator methods have emerged as powerful tools for learning mappings between infinite-dimensional function spaces, yet their potential in optimal control remains largely unexplored. We focus on multi-task control problems, whose solution is a mapping from task description (e.g., cost or dynamics functions) to optimal control law (e.g., feedback policy). We approximate these solution operators using a permutation-invariant neural operator architecture. Across a range of parametric optimal control environments and a locomotion benchmark, a single operator trained via behavioral cloning accurately approximates the solution operator and generalizes to unseen tasks, out-of-distribution settings, and varying amounts of task observations. We further show that the branch-trunk structure of our neural operator architecture enables efficient and flexible adaptation to new tasks. We develop structured adaptation strategies ranging from lightweight updates to full-network fine-tuning, achieving strong performance across different data and compute settings. Finally, we introduce meta-trained operator variants that optimize the initialization for few-shot adaptation. These methods enable rapid task adaptation with limited data and consistently outperform a popular meta-learning baseline. Together, our results demonstrate that neural operators provide a unified and efficient framework for multi-task control and adaptation.
CLApr 4, 2023
Geotechnical Parrot Tales (GPT): Harnessing Large Language Models in geotechnical engineeringKrishna Kumar
The widespread adoption of large language models (LLMs), such as OpenAI's ChatGPT, could revolutionize various industries, including geotechnical engineering. However, GPT models can sometimes generate plausible-sounding but false outputs, leading to hallucinations. In this article, we discuss the importance of prompt engineering in mitigating these risks and harnessing the full potential of GPT for geotechnical applications. We explore the challenges and pitfalls associated with LLMs and highlight the role of context in ensuring accurate and valuable responses. Furthermore, we examine the development of context-specific search engines and the potential of LLMs to become a natural interface for complex tasks, such as data analysis and design. We also develop a unified interface using natural language to handle complex geotechnical engineering tasks and data analysis. By integrating GPT into geotechnical engineering workflows, professionals can streamline their work and develop sustainable and resilient infrastructure systems for the future.
GRJun 25, 2022
Minority Report: A Graph Network Oracle for In Situ VisualizationKrishna Kumar, Paul Navrátil, Andrew Solis et al.
In situ visualization techniques are hampered by a lack of foresight: crucial simulation phenomena can be missed due to a poor sampling rate or insufficient detail at critical timesteps. Keeping a human in the loop is impractical, and defining statistical triggers can be difficult. This paper demonstrates the potential for using a machine-learning-based simulation surrogate as an oracle to identify expected critical regions of a large-scale simulation. These critical regions are used to drive the in situ analysis, providing greater data fidelity and analysis resolution with an equivalent I/O budget to a traditional in situ framework. We develop a distributed asynchronous in situ visualization by integrating TACC Galaxy with CB-Geo MPM for material point simulation of granular flows. We employ a PyTorch-based 3D Graph Network Simulator (GNS) trained on granular flow problems as an oracle to predict the dynamics of granular flows. Critical regions of interests are manually tagged in GNS for in situ rendering in MPM.
GEO-PHJun 15, 2022
A machine learning approach to predicting pore pressure response in liquefiable sands under cyclic loadingYongjin Choi, Krishna Kumar
Shear stress history controls the pore pressure response in liquefiable soils. The excess pore pressure does not increase under cyclic loading when shear stress amplitude is lower than the peak prior amplitude -- the shielding effect. Many sophisticated constitutive models fail to capture the shielding effect observed in the cyclic liquefaction experiments. We develop a data-driven machine learning model based on the LSTM neural network to capture the liquefaction response of soils under cyclic loading. The LSTM model is trained on 12 laboratory cyclic simple shear tests on Nevada sand in loose and dense conditions subjected to different cyclic simple shear loading conditions. The LSTM model features include the relative density of soil and the previous stress history to predict the pore water pressure response. The LSTM model successfully replicates the pore pressure response for three cyclic simple test results considering the shielding and density effects.
GEO-PHSep 23, 2023
Accelerating Particle and Fluid Simulations with Differentiable Graph Networks for Solving Forward and Inverse ProblemsKrishna Kumar, Yongjin Choi
We leverage physics-embedded differentiable graph network simulators (GNS) to accelerate particulate and fluid simulations to solve forward and inverse problems. GNS represents the domain as a graph with particles as nodes and learned interactions as edges. Compared to modeling global dynamics, GNS enables learning local interaction laws through edge messages, improving its generalization to new environments. GNS achieves over 165x speedup for granular flow prediction compared to parallel CPU numerical simulations. We propose a novel hybrid GNS/Material Point Method (MPM) to accelerate forward simulations by minimizing error on a pure surrogate model by interleaving MPM in GNS rollouts to satisfy conservation laws and minimize errors achieving 24x speedup compared to pure numerical simulations. The differentiable GNS enables solving inverse problems through automatic differentiation, identifying material parameters that result in target runout distances. We demonstrate the ability of GNS to solve inverse problems by iteratively updating the friction angle (a material property) by computing the gradient of a loss function based on the final and target runouts, thereby identifying the friction angle that best matches the observed runout. The physics-embedded and differentiable simulators open an exciting new paradigm for AI-accelerated design, control, and optimization.
GEO-PHNov 13, 2023
Three-dimensional granular flow simulation using graph neural network-based learned simulatorYongjin Choi, Krishna Kumar
Reliable evaluations of geotechnical hazards like landslides and debris flow require accurate simulation of granular flow dynamics. Traditional numerical methods can simulate the complex behaviors of such flows that involve solid-like to fluid-like transitions, but they are computationally intractable when simulating large-scale systems. Surrogate models based on statistical or machine learning methods are a viable alternative, but they are typically empirical and rely on a confined set of parameters in evaluating associated risks. Due to their permutation-dependent learning, conventional machine learning models require an unreasonably large amount of training data for building generalizable surrogate models. We employ a graph neural network (GNN), a novel deep learning technique, to develop a GNN-based simulator (GNS) for granular flows to address these issues. Graphs represent the state of granular flows and interactions, like the exchange of energy and momentum between grains, and GNN learns the local interaction law. GNS takes the current state of the granular flow and estimates the next state using Euler explicit integration. We train GNS on a limited set of granular flow trajectories and evaluate its performance in a three-dimensional granular column collapse domain. GNS successfully reproduces the overall behaviors of column collapses with various aspect ratios that were not encountered during training. The computation speed of GNS outperforms high-fidelity numerical simulators by 300 times.
13.1LGMar 17
Domain-informed explainable boosting machines for trustworthy lateral spread predictionsCheng-Hsi Hsiao, Krishna Kumar, Ellen M. Rathje
Explainable Boosting Machines (EBMs) provide transparent predictions through additive shape functions, enabling direct inspection of feature contributions. However, EBMs can learn non-physical relationships that reduce their reliability in natural hazard applications. This study presents a domain-informed framework to improve the physical consistency of EBMs for lateral spreading prediction. Our approach modifies learned shape functions based on domain knowledge. These modifications correct non-physical behavior while maintaining data-driven patterns. We apply the method to the 2011 Christchurch earthquake dataset and correct non-physical trends observed in the original EBM. The resulting model produces more physically consistent global and local explanations, with an acceptable tradeoff in accuracy (4--5\%).
SYSep 6, 2023
3D Object Positioning Using Differentiable Multimodal LearningSean Zanyk-McLean, Krishna Kumar, Paul Navratil
This article describes a multi-modal method using simulated Lidar data via ray tracing and image pixel loss with differentiable rendering to optimize an object's position with respect to an observer or some referential objects in a computer graphics scene. Object position optimization is completed using gradient descent with the loss function being influenced by both modalities. Typical object placement optimization is done using image pixel loss with differentiable rendering only, this work shows the use of a second modality (Lidar) leads to faster convergence. This method of fusing sensor input presents a potential usefulness for autonomous vehicles, as these methods can be used to establish the locations of multiple actors in a scene. This article also presents a method for the simulation of multiple types of data to be used in the training of autonomous vehicles.
GEO-PHApr 24, 2024
Explainable AI models for predicting liquefaction-induced lateral spreadingCheng-Hsi Hsiao, Krishna Kumar, Ellen Rathje
Earthquake-induced liquefaction can cause substantial lateral spreading, posing threats to infrastructure. Machine learning (ML) can improve lateral spreading prediction models by capturing complex soil characteristics and site conditions. However, the "black box" nature of ML models can hinder their adoption in critical decision-making. This study addresses this limitation by using SHapley Additive exPlanations (SHAP) to interpret an eXtreme Gradient Boosting (XGB) model for lateral spreading prediction, trained on data from the 2011 Christchurch Earthquake. SHAP analysis reveals the factors driving the model's predictions, enhancing transparency and allowing for comparison with established engineering knowledge. The results demonstrate that the XGB model successfully identifies the importance of soil characteristics derived from Cone Penetration Test (CPT) data in predicting lateral spreading, validating its alignment with domain understanding. This work highlights the value of explainable machine learning for reliable and informed decision-making in geotechnical engineering and hazard assessment.
GEO-PHOct 18, 2024
Machine Learning Aided Modeling of Granular Materials: A ReviewMengqi Wang, Krishna Kumar, Y. T. Feng et al.
Artificial intelligence (AI) has become a buzz word since Google's AlphaGo beat a world champion in 2017. In the past five years, machine learning as a subset of the broader category of AI has obtained considerable attention in the research community of granular materials. This work offers a detailed review of the recent advances in machine learning-aided studies of granular materials from the particle-particle interaction at the grain level to the macroscopic simulations of granular flow. This work will start with the application of machine learning in the microscopic particle-particle interaction and associated contact models. Then, different neural networks for learning the constitutive behaviour of granular materials will be reviewed and compared. Finally, the macroscopic simulations of practical engineering or boundary value problems based on the combination of neural networks and numerical methods are discussed. We hope readers will have a clear idea of the development of machine learning-aided modelling of granular materials via this comprehensive review work.
GEO-PHJan 17, 2024
Inverse analysis of granular flows using differentiable graph neural network simulatorYongjin Choi, Krishna Kumar
Inverse problems in granular flows, such as landslides and debris flows, involve estimating material parameters or boundary conditions based on target runout profile. Traditional high-fidelity simulators for these inverse problems are computationally demanding, restricting the number of simulations possible. Additionally, their non-differentiable nature makes gradient-based optimization methods, known for their efficiency in high-dimensional problems, inapplicable. While machine learning-based surrogate models offer computational efficiency and differentiability, they often struggle to generalize beyond their training data due to their reliance on low-dimensional input-output mappings that fail to capture the complete physics of granular flows. We propose a novel differentiable graph neural network simulator (GNS) by combining reverse mode automatic differentiation of graph neural networks with gradient-based optimization for solving inverse problems. GNS learns the dynamics of granular flow by representing the system as a graph and predicts the evolution of the graph at the next time step, given the current state. The differentiable GNS shows optimization capabilities beyond the training data. We demonstrate the effectiveness of our method for inverse estimation across single and multi-parameter optimization problems, including evaluating material properties and boundary conditions for a target runout distance and designing baffle locations to limit a landslide runout. Our proposed differentiable GNS framework offers an orders of magnitude faster solution to these inverse problems than the conventional finite difference approach to gradient-based optimization.
5.9LGMar 17
Formal verification of tree-based machine learning models for lateral spreadingKrishna Kumar
Machine learning models for geotechnical hazard prediction can achieve high accuracy while learning physically inconsistent relationships from sparse or biased training data. Current remedies (post-hoc explainability, such as SHAP and LIME, and training-time constraints) either diagnose individual predictions approximately or restrict model capacity without providing exhaustive guarantees. This paper encodes trained tree ensembles as logical formulas in a Satisfiability Modulo Theories (SMT) solver and checks physical specifications across the entire input domain, not just sampled points. Four geotechnical specifications (water table depth, PGA monotonicity, distance safety, and flat-ground safety) are formalized as decidable logical formulas and verified via SMT against both XGBoost ensembles and Explainable Boosting Machines (EBMs) trained on the 2011 Christchurch earthquake lateral spreading dataset (7,291 sites, four features). The SMT solver either produces a concrete counterexample where a specification fails or proves that no violation exists. The unconstrained EBM (80.1% accuracy) violates all four specifications. A fully constrained EBM (67.2%) satisfies three of four specifications, demonstrating that iterative constraint application guided by verification can progressively improve physical consistency. A Pareto analysis of 33 model variants reveals a persistent trade-off, as none of the variants studied achieve both greater than 80% accuracy and full compliance with the specified set. SHAP analysis of specification counterexamples shows that the offending feature can rank last, demonstrating that post-hoc explanations do not substitute for formal verification. These results establish a verify-fix-verify engineering loop and a formal certification for deploying physically consistent ML models in safety-critical geotechnical applications.
LGNov 7, 2025
Parameter-Efficient Conditioning for Material Generalization in Graph-Based SimulatorsNaveen Raj Manoharan, Hassan Iqbal, Krishna Kumar
Graph network-based simulators (GNS) have demonstrated strong potential for learning particle-based physics (such as fluids, deformable solids, and granular flows) while generalizing to unseen geometries due to their inherent inductive biases. However, existing models are typically trained for a single material type and fail to generalize across distinct constitutive behaviors, limiting their applicability in real-world engineering settings. Using granular flows as a running example, we propose a parameter-efficient conditioning mechanism that makes the GNS model adaptive to material parameters. We identify that sensitivity to material properties is concentrated in the early message-passing (MP) layers, a finding we link to the local nature of constitutive models (e.g., Mohr-Coulomb) and their effects on information propagation. We empirically validate this by showing that fine-tuning only the first few (1-5) of 10 MP layers of a pretrained model achieves comparable test performance as compared to fine-tuning the entire network. Building on this insight, we propose a parameter-efficient Feature-wise Linear Modulation (FiLM) conditioning mechanism designed to specifically target these early layers. This approach produces accurate long-term rollouts on unseen, interpolated, or moderately extrapolated values (e.g., up to 2.5 degrees for friction angle and 0.25 kPa for cohesion) when trained exclusively on as few as 12 short simulation trajectories from new materials, representing a 5-fold data reduction compared to a baseline multi-task learning method. Finally, we validate the model's utility by applying it to an inverse problem, successfully identifying unknown cohesion parameters from trajectory data. This approach enables the use of GNS in inverse design and closed-loop control tasks where material properties are treated as design variables.
SOFTApr 1, 2025
Towards scientific machine learning for granular material simulations -- challenges and opportunitiesMarc Fransen, Andreas Fürst, Deepak Tunuguntla et al.
Micro-scale mechanisms, such as inter-particle and particle-fluid interactions, govern the behaviour of granular systems. While particle-scale simulations provide detailed insights into these interactions, their computational cost is often prohibitive. Attended by researchers from both the granular materials (GM) and machine learning (ML) communities, a recent Lorentz Center Workshop on "Machine Learning for Discrete Granular Media" brought the ML community up to date with GM challenges. This position paper emerged from the workshop discussions. We define granular materials and identify seven key challenges that characterise their distinctive behaviour across various scales and regimes, ranging from gas-like to fluid-like and solid-like. Addressing these challenges is essential for developing robust and efficient digital twins for granular systems in various industrial applications. To showcase the potential of ML to the GM community, we present classical and emerging machine/deep learning techniques that have been, or could be, applied to granular materials. We reviewed sequence-based learning models for path-dependent constitutive behaviour, followed by encoder-decoder type models for representing high-dimensional data. We then explore graph neural networks and recent advances in neural operator learning. Lastly, we discuss model-order reduction and probabilistic learning techniques for high-dimensional parameterised systems, which are crucial for quantifying uncertainties arising from physics-based and data-driven models. We present a workflow aimed at unifying data structures and modelling pipelines and guiding readers through the selection, training, and deployment of ML surrogates for granular material simulations. Finally, we illustrate the workflow's practical use with two representative examples, focusing on granular materials in solid-like and fluid-like regimes.
LGMay 7, 2025
SetONet: A Set-Based Operator Network for Solving PDEs with Variable-Input SamplingStepan Tretiakov, Xingjian Li, Krishna Kumar
Neural operators, particularly the Deep Operator Network (DeepONet), have shown promise in learning mappings between function spaces for solving differential equations. However, standard DeepONet requires input functions to be sampled at fixed locations, limiting its applicability when sensor configurations vary or inputs exist on irregular grids. We introduce the Set Operator Network (SetONet), which modifies DeepONet's branch network to process input functions as unordered sets of location-value pairs. By incorporating Deep Sets principles, SetONet ensures permutation invariance while maintaining the same parameter count as the baseline. On classical operator-learning benchmarks, SetONet achieves parity with DeepONet on fixed layouts while sustaining accuracy under variable sensor configurations or sensor drop-off - conditions for which standard DeepONet is not applicable. More significantly, SetONet natively handles problems where inputs are naturally represented as unstructured point clouds (such as point sources or density samples) rather than values on fixed grids, a capability standard DeepONet lacks. On heat conduction with point sources, advection-diffusion modeling chemical plumes, and optimal transport between density samples, SetONet learns operators end-to-end without rasterization or multi-stage pipelines. These problems feature inputs that are naturally discrete point sets (point sources or density samples) rather than functions on fixed grids. SetONet is a DeepONet-class architecture that addresses such problems with a lightweight design, significantly broadening the applicability of operator learning to problems with variable, incomplete, or unstructured input data.
LGOct 10, 2025
Stability of Transformers under Layer NormalizationKelvin Kan, Xingjian Li, Benjamin J. Zhang et al.
Despite their widespread use, training deep Transformers can be unstable. Layer normalization, a standard component, improves training stability, but its placement has often been ad-hoc. In this paper, we conduct a principled study on the forward (hidden states) and backward (gradient) stability of Transformers under different layer normalization placements. Our theory provides key insights into the training dynamics: whether training drives Transformers toward regular solutions or pathological behaviors. For forward stability, we derive explicit bounds on the growth of hidden states in trained Transformers. For backward stability, we analyze how layer normalization affects the backpropagation of gradients, thereby explaining the training dynamics of each layer normalization placement. Our analysis also guides the scaling of residual steps in Transformer blocks, where appropriate choices can further improve stability and performance. Our numerical results corroborate our theoretical findings. Beyond these results, our framework provides a principled way to sanity-check the stability of Transformers under new architectural modifications, offering guidance for future designs.
OCSep 22, 2025
Zero-Shot Transferable Solution Method for Parametric Optimal Control ProblemsXingjian Li, Kelvin Kan, Deepanshu Verma et al.
This paper presents a transferable solution method for optimal control problems with varying objectives using function encoder (FE) policies. Traditional optimization-based approaches must be re-solved whenever objectives change, resulting in prohibitive computational costs for applications requiring frequent evaluation and adaptation. The proposed method learns a reusable set of neural basis functions that spans the control policy space, enabling efficient zero-shot adaptation to new tasks through either projection from data or direct mapping from problem specifications. The key idea is an offline-online decomposition: basis functions are learned once during offline imitation learning, while online adaptation requires only lightweight coefficient estimation. Numerical experiments across diverse dynamics, dimensions, and cost structures show our method delivers near-optimal performance with minimal overhead when generalizing across tasks, enabling semi-global feedback policies suitable for real-time deployment.
LGApr 15, 2025
MLPs and KANs for data-driven learning in physical problems: A performance comparisonRaghav Pant, Sikan Li, Xingjian Li et al.
There is increasing interest in solving partial differential equations (PDEs) by casting them as machine learning problems. Recently, there has been a spike in exploring Kolmogorov-Arnold Networks (KANs) as an alternative to traditional neural networks represented by Multi-Layer Perceptrons (MLPs). While showing promise, their performance advantages in physics-based problems remain largely unexplored. Several critical questions persist: Can KANs capture complex physical dynamics and under what conditions might they outperform traditional architectures? In this work, we present a comparative study of KANs and MLPs for learning physical systems governed by PDEs. We assess their performance when applied in deep operator networks (DeepONet) and graph network-based simulators (GNS), and test them on physical problems that vary significantly in scale and complexity. Drawing inspiration from the Kolmogorov Representation Theorem, we examine the behavior of KANs and MLPs across shallow and deep network architectures. Our results reveal that although KANs do not consistently outperform MLPs when configured as deep neural networks, they demonstrate superior expressiveness in shallow network settings, significantly outpacing MLPs in accuracy over our test cases. This suggests that KANs are a promising choice, offering a balance of efficiency and accuracy in applications involving physical systems.
54.3AIApr 2
De Jure: Iterative LLM Self-Refinement for Structured Extraction of Regulatory RulesKeerat Guliani, Deepkamal Gill, David Landsman et al.
Regulatory documents encode legally binding obligations that LLM-based systems must respect. Yet converting dense, hierarchically structured legal text into machine-readable rules remains a costly, expert-intensive process. We present De Jure, a fully automated, domain-agnostic pipeline for extracting structured regulatory rules from raw documents, requiring no human annotation, domain-specific prompting, or annotated gold data. De Jure operates through four sequential stages: normalization of source documents into structured Markdown; LLM-driven semantic decomposition into structured rule units; multi-criteria LLM-as-a-judge evaluation across 19 dimensions spanning metadata, definitions, and rule semantics; and iterative repair of low-scoring extractions within a bounded regeneration budget, where upstream components are repaired before rule units are evaluated. We evaluate De Jure across four models on three regulatory corpora spanning finance, healthcare, and AI governance. On the finance domain, De Jure yields consistent and monotonic improvement in extraction quality, reaching peak performance within three judge-guided iterations. De Jure generalizes effectively to healthcare and AI governance, maintaining high performance across both open- and closed-source models. In a downstream compliance question-answering evaluation via RAG, responses grounded in De Jure extracted rules are preferred over prior work in 73.8% of cases at single-rule retrieval depth, rising to 84.0% under broader retrieval, confirming that extraction fidelity translates directly into downstream utility. These results demonstrate that explicit, interpretable evaluation criteria can substitute for human annotation in complex regulatory domains, offering a scalable and auditable path toward regulation-grounded LLM alignment.
GEO-PHDec 30, 2025
Deep Learning in Geotechnical Engineering: A Critical Assessment of PINNs and Operator LearningKrishna Kumar
Deep learning methods -- physics-informed neural networks (PINNs), deep operator networks (DeepONet), and graph network simulators (GNS) -- are increasingly proposed for geotechnical problems. This paper tests these methods against traditional solvers on canonical problems: wave propagation and beam-foundation interaction. PINNs run 90,000 times slower than finite difference with larger errors. DeepONet requires thousands of training simulations and breaks even only after millions of evaluations. Multi-layer perceptrons fail catastrophically when extrapolating beyond training data -- the common case in geotechnical prediction. GNS shows promise for geometry-agnostic simulation but faces scaling limits and cannot capture path-dependent soil behavior. For inverse problems, automatic differentiation through traditional solvers recovers material parameters with sub-percent accuracy in seconds. We recommend: use automatic differentiation for inverse problems; apply site-based cross-validation to account for spatial autocorrelation; reserve neural networks for problems where traditional solvers are genuinely expensive and predictions remain within the training envelope. When a method is four orders of magnitude slower with less accuracy, it is not a viable replacement for proven solvers.
CVJul 11, 2025
From images to properties: a NeRF-driven framework for granular material parameter inversionCheng-Hsi Hsiao, Krishna Kumar
We introduce a novel framework that integrates Neural Radiance Fields (NeRF) with Material Point Method (MPM) simulation to infer granular material properties from visual observations. Our approach begins by generating synthetic experimental data, simulating an plow interacting with sand. The experiment is rendered into realistic images as the photographic observations. These observations include multi-view images of the experiment's initial state and time-sequenced images from two fixed cameras. Using NeRF, we reconstruct the 3D geometry from the initial multi-view images, leveraging its capability to synthesize novel viewpoints and capture intricate surface details. The reconstructed geometry is then used to initialize material point positions for the MPM simulation, where the friction angle remains unknown. We render images of the simulation under the same camera setup and compare them to the observed images. By employing Bayesian optimization, we minimize the image loss to estimate the best-fitting friction angle. Our results demonstrate that friction angle can be estimated with an error within 2 degrees, highlighting the effectiveness of inverse analysis through purely visual observations. This approach offers a promising solution for characterizing granular materials in real-world scenarios where direct measurement is impractical or impossible.
LGMar 17, 2025
Investigating the effect of CPT in lateral spreading prediction using Explainable AICheng-Hsi Hsiao, Ellen Rathje, Krishna Kumar
This study proposes an autoencoder approach to extract latent features from cone penetration test profiles to evaluate the potential of incorporating CPT data in an AI model. We employ autoencoders to compress 200 CPT profiles of soil behavior type index (Ic) and normalized cone resistance (qc1Ncs) into ten latent features while preserving critical information. We then utilize the extracted latent features with site parameters to train XGBoost models for predicting lateral spreading occurrences in the 2011 Christchurch earthquake. Models using the latent CPT features outperformed models with conventional CPT metrics or no CPT data, achieving over 83% accuracy. Explainable AI revealed the most crucial latent feature corresponding to soil behavior between 1-3 meter depths, highlighting this depth range's criticality for liquefaction evaluation. The autoencoder approach provides an automated technique for condensing CPT profiles into informative latent features for machine-learning liquefaction models.
SDMay 30, 2023
Audio classification using ML methodsKrishna Kumar
Machine Learning systems have achieved outstanding performance in different domains. In this paper machine learning methods have been applied to classification task to classify music genre. The code shows how to extract features from audio files and classify them using supervised learning into 2 genres namely classical and metal. Algorithms used are LogisticRegression, SVC using different kernals (linear, sigmoid, rbf and poly), KNeighborsClassifier , RandomForestClassifier, DecisionTreeClassifier and GaussianNB.
GEO-PHMay 9, 2023
Graph Neural Network-based surrogate model for granular flowsYongjin Choi, Krishna Kumar
Accurate simulation of granular flow dynamics is crucial for assessing various geotechnical risks, including landslides and debris flows. Granular flows involve a dynamic rearrangement of particles exhibiting complex transitions from solid-like to fluid-like responses. Traditional continuum and discrete numerical methods are limited by their computational cost in simulating large-scale systems. Statistical or machine learning-based models offer an alternative. Still, they are largely empirical, based on a limited set of parameters. Due to their permutation-dependent learning, traditional machine learning-based models require huge training data to generalize. To resolve these problems, we use a graph neural network, a state-of-the-art machine learning architecture that learns local interactions. Graphs represent the state of dynamically changing granular flows and the interaction laws, such as energy and momentum exchange between grains. We develop a graph neural network-based simulator (GNS) that takes the current state of granular flow and predicts the next state using Euler explicit integration by learning the local interaction laws. We train GNS on different granular trajectories. We then assess the performance of GNS by predicting granular column collapse. GNS accurately predicts flow dynamics for column collapses with different aspect ratios unseen during training. GNS is hundreds of times faster than high-fidelity numerical simulators. The model also generalizes to domains much larger than the training data, handling more than twice the number of particles than it was trained on.
SEAug 6, 2014
Early Development of UVM based Verification Environment of Image Signal Processing Designs using TLM Reference Model of RTLAbhishek Jain, Dr. Hima Gupta, Sandeep Jana et al.
With semiconductor industry trend of smaller the better, from an idea to a final product, more innovation on product portfolio and yet remaining competitive and profitable are few criteria which are culminating into pressure and need for more and more innovation for CAD flow, process management and project execution cycle. Project schedules are very tight and to achieve first silicon success is key for projects. This necessitates quicker verification with better coverage matrix. Quicker Verification requires early development of the verification environment with wider test vectors without waiting for RTL to be available. In this paper, we are presenting a novel approach of early development of reusable multi-language verification flow, by addressing four major activities of verification like Early creation of Executable Specification, Early creation of Verification Environment, Early development of test vectors and Better and increased Re-use of blocks. Although this paper focuses on early development of UVM based Verification Environment of Image Signal Processing designs using TLM Reference Model of RTL, same concept can be extended for non-image signal processing designs. Main Keywords are SystemVerilog, SystemC, Transaction Level Modeling, Universal Verification Methodology (UVM), Processor model, Universal Verification Component (UVC), Reference Model.