LGJan 29
Exact closed-form Gaussian moments of residual layersSimon Kuang, Xinfan Lin
We study the problem of propagating the mean and covariance of a general multivariate Gaussian distribution through a deep (residual) neural network using layer-by-layer moment matching. We close a longstanding gap by deriving exact moment matching for the probit, GeLU, ReLU (as a limit of GeLU), Heaviside (as a limit of probit), and sine activation functions; for both feedforward and generalized residual layers. On random networks, we find orders-of-magnitude improvements in the KL divergence error metric, up to a millionfold, over popular alternatives. On real data, we find competitive statistical calibration for inference under epistemic uncertainty in the input. On a variational Bayes network, we show that our method attains hundredfold improvements in KL divergence from Monte Carlo ground truth over a state-of-the-art deterministic inference method. We also give an a priori error bound and a preliminary analysis of stochastic feedforward neurons, which have recently attracted general interest.
SYNov 12, 2025
Assumed Density Filtering and Smoothing with Neural Network Surrogate ModelsSimon Kuang, Xinfan Lin
The Kalman filter and Rauch-Tung-Striebel (RTS) smoother are optimal for state estimation in linear dynamic systems. With nonlinear systems, the challenge consists in how to propagate uncertainty through the state transitions and output function. For the case of a neural network model, we enable accurate uncertainty propagation using a recent state-of-the-art analytic formula for computing the mean and covariance of a deep neural network with Gaussian input. We argue that cross entropy is a more appropriate performance metric than RMSE for evaluating the accuracy of filters and smoothers. We demonstrate the superiority of our method for state estimation on a stochastic Lorenz system and a Wiener system, and find that our method enables more optimal linear quadratic regulation when the state estimate is used for feedback.
46.4LGMar 30
Lipschitz verification of neural networks through trainingSimon Kuang, Yuezhu Xu, S. Sivaranjani et al.
The global Lipschitz constant of a neural network governs both adversarial robustness and generalization. Conventional approaches to ``certified training" typically follow a train-then-verify paradigm: they train a network and then attempt to bound its Lipschitz constant. Because the efficient ``trivial bound" (the product of the layerwise Lipschitz constants) is exponentially loose for arbitrary networks, these approaches must rely on computationally expensive techniques such as semidefinite programming, mixed-integer programming, or branch-and-bound. We propose a different paradigm: rather than designing complex verifiers for arbitrary networks, we design networks to be verifiable by the fast trivial bound. We show that directly penalizing the trivial bound during training forces it to become tight, thereby effectively regularizing the true Lipschitz constant. To achieve this, we identify three structural obstructions to a tight trivial bound (dead neurons, bias terms, and ill-conditioned weights) and introduce architectural mitigations, including a novel notion of norm-saturating polyactivations and bias-free sinusoidal layers. Our approach avoids the runtime complexity of advanced verification while achieving strong results: we train robust networks on MNIST with Lipschitz bounds that are small (orders of magnitude lower than comparable works) and tight (within 10% of the ground truth). The experimental results validate the theoretical guarantees, support the proposed mechanisms, and extend empirically to diverse activations and non-Euclidean norms.
46.8SYApr 1
Incremental stability in $p=1$ and $p=\infty$: classification and synthesisSimon Kuang, Xinfan Lin
All Lipschitz dynamics with the weak infinitesimal contraction (WIC) property can be expressed as a Lipschitz nonlinear system in proportional negative feedback -- this statement, a ``structure theorem,'' is true in the $p=1$ and $p=\infty$ norms. Equivalently, a Lipschitz vector field is WIC if and only if it can be written as a scalar decay plus a Lipschitz-bounded residual. We put this theorem to use using neural networks to approximate Lipschitz functions. This results in a map from unconstrained parameters to the set of WIC vector fields, enabling standard gradient-based training with no projections or penalty terms. Because the induced $1$- and $\infty$-norms of a matrix reduce to row or column sums, Lipschitz certification costs only $O(d^2)$ operations -- the same order as a forward pass and appreciably cheaper than eigenvalue or semidefinite methods for the $2$-norm. Numerical experiments on a planar flow-fitting task and a four-node opinion network demonstrate that the parameterization (re-)constructs contracting dynamics from trajectory data. In a discussion of the expressiveness of non-Euclidean contraction, we prove that the set of $2\times 2$ systems that contract in a weighted $1$- or $\infty$-norm is characterized by an eigenvalue cone, a strict subset of the Hurwitz region that quantifies the cost of moving away from the Euclidean norm.