LGOct 27, 2023
Causal disentanglement of multimodal dataElise Walker, Jonas A. Actor, Carianne Martinez et al.
Causal representation learning algorithms discover lower-dimensional representations of data that admit a decipherable interpretation of cause and effect; as achieving such interpretable representations is challenging, many causal learning algorithms utilize elements indicating prior information, such as (linear) structural causal models, interventional data, or weak supervision. Unfortunately, in exploratory causal representation learning, such elements and prior information may not be available or warranted. Alternatively, scientific datasets often have multiple modalities or physics-based constraints, and the use of such scientific, multimodal data has been shown to improve disentanglement in fully unsupervised settings. Consequently, we introduce a causal representation learning algorithm (causalPIMA) that can use multimodal data and known physics to discover important features with causal relationships. Our innovative algorithm utilizes a new differentiable parametrization to learn a directed acyclic graph (DAG) together with a latent space of a variational autoencoder in an end-to-end differentiable framework via a single, tractable evidence lower bound loss function. We place a Gaussian mixture prior on the latent space and identify each of the mixtures with an outcome of the DAG nodes; this novel identification enables feature discovery with causal relationships. Tested against a synthetic and a scientific dataset, our results demonstrate the capability of learning an interpretable causal structure while simultaneously discovering key features in a fully unsupervised setting.
38.8FLU-DYNMar 30
Data-informed lifting line theoryArjun Sharma, Jonas A. Actor, Peter A. Bosler
We present a data-driven framework that extends the predictive capability of classical lifting-line theory (LLT) to a wider aerodynamic regime by incorporating higher-fidelity aerodynamic data from panel method simulations. A neural network architecture with a convolutional layer followed by fully connected layers is developed, comprising two parallel subnetworks to separately process spanwise collocation points and global geometric/aerodynamic inputs such as angle of attack, chord, twist, airfoil distribution, and sweep. Among several configurations tested, this architecture is most effective in learning corrections to LLT outputs. The trained model captures higher-order three-dimensional effects in spanwise lift and drag distributions in regimes where LLT is inaccurate, such as low aspect ratios and high sweep, and generalizes well to wing configurations outside both the LLT regime and the training data range. The method retains LLT's computational efficiency, enabling integration into aerodynamic optimization loops and early-stage aircraft design studies. This approach offers a practical path for embedding high-fidelity corrections into low-order methods and may be extended to other aerodynamic prediction tasks, such as propeller performance.
LGMar 5
Multilevel Training for Kolmogorov Arnold NetworksBen S. Southworth, Jonas A. Actor, Graham Harper et al.
Algorithmic speedup of training common neural architectures is made difficult by the lack of structure guaranteed by the function compositions inherent to such networks. In contrast to multilayer perceptrons (MLPs), Kolmogorov-Arnold networks (KANs) provide more structure by expanding learned activations in a specified basis. This paper exploits this structure to develop practical algorithms and theoretical insights, yielding training speedup via multilevel training for KANs. To do so, we first establish an equivalence between KANs with spline basis functions and multichannel MLPs with power ReLU activations through a linear change of basis. We then analyze how this change of basis affects the geometry of gradient-based optimization with respect to spline knots. The KANs change-of-basis motivates a multilevel training approach, where we train a sequence of KANs naturally defined through a uniform refinement of spline knots with analytic geometric interpolation operators between models. The interpolation scheme enables a ``properly nested hierarchy'' of architectures, ensuring that interpolation to a fine model preserves the progress made on coarse models, while the compact support of spline basis functions ensures complementary optimization on subsequent levels. Numerical experiments demonstrate that our multilevel training approach can achieve orders of magnitude improvement in accuracy over conventional methods to train comparable KANs or MLPs, particularly for physics informed neural networks. Finally, this work demonstrates how principled design of neural networks can lead to exploitable structure, and in this case, multilevel algorithms that can dramatically improve training performance.
LGSep 4, 2025
Deriving Transformer Architectures as Implicit Multinomial RegressionJonas A. Actor, Anthony Gruber, Eric C. Cyr
While attention has been empirically shown to improve model performance, it lacks a rigorous mathematical justification. This short paper establishes a novel connection between attention mechanisms and multinomial regression. Specifically, we show that in a fixed multinomial regression setting, optimizing over latent features yields solutions that align with the dynamics induced on features by attention blocks. In other words, the evolution of representations through a transformer can be interpreted as a trajectory that recovers the optimal features for classification.
LGJun 3, 2025
Discovery of Probabilistic Dirichlet-to-Neumann Maps on GraphsAdrienne M. Propp, Jonas A. Actor, Elise Walker et al.
Dirichlet-to-Neumann maps enable the coupling of multiphysics simulations across computational subdomains by ensuring continuity of state variables and fluxes at artificial interfaces. We present a novel method for learning Dirichlet-to-Neumann maps on graphs using Gaussian processes, specifically for problems where the data obey a conservation constraint from an underlying partial differential equation. Our approach combines discrete exterior calculus and nonlinear optimal recovery to infer relationships between vertex and edge values. This framework yields data-driven predictions with uncertainty quantification across the entire graph, even when observations are limited to a subset of vertices and edges. By optimizing over the reproducing kernel Hilbert space norm while applying a maximum likelihood estimation penalty on kernel complexity, our method ensures that the resulting surrogate strictly enforces conservation laws without overfitting. We demonstrate our method on two representative applications: subsurface fracture networks and arterial blood flow. Our results show that the method maintains high accuracy and well-calibrated uncertainty estimates even under severe data scarcity, highlighting its potential for scientific applications where limited data and reliable uncertainty quantification are critical.
LGMay 23, 2025
Leveraging KANs for Expedient Training of Multichannel MLPs via Preconditioning and Geometric RefinementJonas A. Actor, Graham Harper, Ben Southworth et al.
Multilayer perceptrons (MLPs) are a workhorse machine learning architecture, used in a variety of modern deep learning frameworks. However, recently Kolmogorov-Arnold Networks (KANs) have become increasingly popular due to their success on a range of problems, particularly for scientific machine learning tasks. In this paper, we exploit the relationship between KANs and multichannel MLPs to gain structural insight into how to train MLPs faster. We demonstrate the KAN basis (1) provides geometric localized support, and (2) acts as a preconditioned descent in the ReLU basis, overall resulting in expedited training and improved accuracy. Our results show the equivalence between free-knot spline KAN architectures, and a class of MLPs that are refined geometrically along the channel dimension of each weight tensor. We exploit this structural equivalence to define a hierarchical refinement scheme that dramatically accelerates training of the multi-channel MLP architecture. We show further accuracy improvements can be had by allowing the $1$D locations of the spline knots to be trained simultaneously with the weights. These advances are demonstrated on a range of benchmark examples for regression and scientific machine learning.
LGFeb 6, 2025
Mixture of neural operator experts for learning boundary conditions and model selectionDwyer Deighan, Jonas A. Actor, Ravi G. Patel
While Fourier-based neural operators are best suited to learning mappings between functions on periodic domains, several works have introduced techniques for incorporating non trivial boundary conditions. However, all previously introduced methods have restrictions that limit their applicability. In this work, we introduce an alternative approach to imposing boundary conditions inspired by volume penalization from numerical methods and Mixture of Experts (MoE) from machine learning. By introducing competing experts, the approach additionally allows for model selection. To demonstrate the method, we combine a spatially conditioned MoE with the Fourier based, Modal Operator Regression for Physics (MOR-Physics) neural operator and recover a nonlinear operator on a disk and quarter disk. Next, we extract a large eddy simulation (LES) model from direct numerical simulation of channel flow and show the domain decomposition provided by our approach. Finally, we train our LES model with Bayesian variational inference and obtain posterior predictive samples of flow far past the DNS simulation time horizon.
LGOct 26, 2021
Polynomial-Spline Neural Networks with Exact IntegralsJonas A. Actor, Andy Huang, Nathaniel Trask
Using neural networks to solve variational problems, and other scientific machine learning tasks, has been limited by a lack of consistency and an inability to exactly integrate expressions involving neural network architectures. We address these limitations by formulating a novel neural network architecture that combines a polynomial mixture-of-experts model with free knot B1-spline basis functions. Effectively, our architecture performs piecewise polynomial approximation on each cell of a trainable partition of unity. Our architecture exhibits both $h$- and $p$- refinement for regression problems at the convergence rates expected from approximation theory, allowing for consistency in solving variational problems. Moreover, this architecture, its moments, and its partial derivatives can all be integrated exactly, obviating a reliance on sampling or quadrature and enabling error-free computation of variational forms. We demonstrate the success of our network on a range of regression and variational problems that illustrate the consistency and exact integrability of our network architecture.
IVApr 21, 2021
PocketNet: A Smaller Neural Network for Medical Image AnalysisAdrian Celaya, Jonas A. Actor, Rajarajeswari Muthusivarajan et al.
Medical imaging deep learning models are often large and complex, requiring specialized hardware to train and evaluate these models. To address such issues, we propose the PocketNet paradigm to reduce the size of deep learning models by throttling the growth of the number of channels in convolutional neural networks. We demonstrate that, for a range of segmentation and classification tasks, PocketNet architectures produce results comparable to that of conventional neural networks while reducing the number of parameters by multiple orders of magnitude, using up to 90% less GPU memory, and speeding up training times by up to 40%, thereby allowing such models to be trained and deployed in resource-constrained settings.