Eduardo D. Sontag

LG
h-index32
13papers
18citations
Novelty48%
AI Score50

13 Papers

SYAug 2, 2012
Logarithmic Lipschitz norms and diffusion-induced instability

Zahra Aminzare, Eduardo D. Sontag

This paper proves that contractive ordinary differential equation systems remain contractive when diffusion is added. Thus, diffusive instabilities, in the sense of the Turing phenomenon, cannot arise for such systems. An important biochemical system is shown to satisfy the required conditions.

SYJan 30, 2016
A remark on incoherent feedforward circuits as change detectors and feedback controllers

Eduardo D. Sontag

This note analyzes incoherent feedforward loops in signal processing and control. It studies the response properties of IFFL's to exponentially growing inputs, both for a standard version of the IFFL and for a variation in which the output variable has a positive self-feedback term. It also considers a negative feedback configuration, using such a device as a controller. It uncovers a somewhat surprising phenomenon in which stabilization is only possible in disconnected regions of parameter space, as the controlled system's growth rate is varied.

LGOct 3, 2023
Why should autoencoders work?

Matthew D. Kvalheim, Eduardo D. Sontag

Deep neural network autoencoders are routinely used computationally for model reduction. They allow recognizing the intrinsic dimension of data that lie in a $k$-dimensional subset $K$ of an input Euclidean space $\mathbb{R}^n$. The underlying idea is to obtain both an encoding layer that maps $\mathbb{R}^n$ into $\mathbb{R}^k$ (called the bottleneck layer or the space of latent variables) and a decoding layer that maps $\mathbb{R}^k$ back into $\mathbb{R}^n$, in such a way that the input data from the set $K$ is recovered when composing the two maps. This is achieved by adjusting parameters (weights) in the network to minimize the discrepancy between the input and the reconstructed output. Since neural networks (with continuous activation functions) compute continuous maps, the existence of a network that achieves perfect reconstruction would imply that $K$ is homeomorphic to a $k$-dimensional subset of $\mathbb{R}^k$, so clearly there are topological obstructions to finding such a network. On the other hand, in practice the technique is found to "work" well, which leads one to ask if there is a way to explain this effectiveness. We show that, up to small errors, indeed the method is guaranteed to work. This is done by appealing to certain facts from differential topology. A computational example is also included to illustrate the ideas.

SYMar 19
Remarks on Lipschitz-Minimal Interpolation: Generalization Bounds and Neural Network Implementation

Arthur C. B. de Oliveira, Ruigang Wang, Ian R. Manchester et al.

This note establishes a theoretical framework for finding (potentially overparameterized) approximations of a function on a compact set with a-priori bounds for the generalization error. The approximation method considered is to choose, among all functions that (approximately) interpolate a given data set, one with a minimal Lipschitz constant. The paper establishes rigorous generalization bounds over practically relevant classes of approximators, including deep neural networks. It also presents a neural network implementation based on Lipschitz-bounded network layers and an augmented Lagrangian method. The results are illustrated for a problem of learning the dynamics of an input-to-state stable system with certified bounds on simulation error.

OCAug 30, 2024
Exact Recovery Guarantees for Parameterized Nonlinear System Identification Problem under Sparse Disturbances or Semi-Oblivious Attacks

Haixiang Zhang, Baturalp Yalcin, Javad Lavaei et al.

In this work, we study the problem of learning a nonlinear dynamical system by parameterizing its dynamics using basis functions. We assume that disturbances occur at each time step with an arbitrary probability $p$, which models the sparsity level of the disturbance vectors over time. These disturbances are drawn from an arbitrary, unknown probability distribution, which may depend on past disturbances, provided that it satisfies a zero-mean assumption. The primary objective of this paper is to learn the system's dynamics within a finite time and analyze the sample complexity as a function of $p$. To achieve this, we examine a LASSO-type non-smooth estimator, and establish necessary and sufficient conditions for its well-specifiedness and the uniqueness of the global solution to the underlying optimization problem. We then provide exact recovery guarantees for the estimator under two distinct conditions: boundedness and Lipschitz continuity of the basis functions. We show that finite-time exact recovery is achieved with high probability, even when $p$ approaches 1. Unlike prior works, which primarily focus on independent and identically distributed (i.i.d.) disturbances and provide only asymptotic guarantees for system learning, this study presents the first finite-time analysis of nonlinear dynamical systems under a highly general disturbance model. Our framework allows for possible temporal correlations in the disturbances and accommodates semi-oblivious adversarial attacks, significantly broadening the scope of existing theoretical results.

LGNov 6, 2025
Autoencoding Dynamics: Topological Limitations and Capabilities

Matthew D. Kvalheim, Eduardo D. Sontag

Given a "data manifold" $M\subset \mathbb{R}^n$ and "latent space" $\mathbb{R}^\ell$, an autoencoder is a pair of continuous maps consisting of an "encoder" $E\colon \mathbb{R}^n\to \mathbb{R}^\ell$ and "decoder" $D\colon \mathbb{R}^\ell\to \mathbb{R}^n$ such that the "round trip" map $D\circ E$ is as close as possible to the identity map $\mbox{id}_M$ on $M$. We present various topological limitations and capabilites inherent to the search for an autoencoder, and describe capabilities for autoencoding dynamical systems having $M$ as an invariant manifold.

DSApr 2
When is cumulative dose response monotonic? Analysis of incoherent feedforward motifs

Moh Kamalul Wafi, Arthur C. B. de Oliveira, Eduardo D. Sontag

We study the monotonicity of the cumulative dose response (cDR) for a class of incoherent feedforward motifs (IFFM) systems with linear intermediate dynamics and nonlinear output dynamics. While the instantaneous dose response (DR) may be nonmonotone with respect to the input, the cDR can still be monotone. To analyze this phenomenon, we derive an integral representation of the sensitivity of cDR with respect to the input and establish general sufficient conditions for both monotonicity and non-monotonicity. These results reduce the problem to verifying qualitative sign properties along system trajectories. We apply this framework to four canonical IFFM systems and obtain a complete characterization of their behavior. In particular, IFFM1 and IFFM3 exhibit monotone cDR despite potentially non-monotone DR, while IFFM2 is monotone already at the level of DR, which implies monotonicity of cDR. In contrast, IFFM4 violates these conditions, leading to a loss of monotonicity. Numerical simulations indicate that these properties persist beyond the structured initial conditions used in the analysis. Overall, our results provide a unified framework for understanding how network structure governs monotonicity in cumulative input-output responses.

OCApr 30
Boundedness of solutions in feedback systems with antithetic controllers

Moh Kamalul Wafi, Arthur C. B. de Oliveira, Eduardo D. Sontag

This paper studies whether solutions of a class of nonlinear feedback systems remain bounded over time. The systems we consider arise naturally in synthetic biology, where the antithetic feedback controller regulates a biological process through a delayed feedback loop. Our main result is that every trajectory of such a system is bounded. The key insight is simple: if the regulated state grows too large for too long, the feedback loop will eventually respond and push it back down. More precisely, we show that whenever the state exceeds a threshold and remains there long enough, the feedback signal becomes strong enough to force the state to decrease. We then show that once this happens, the feedback remains strong enough to keep the state from growing unbounded. The proof works directly with differential inequalities and does not require constructing a Lyapunov function, making the mechanism transparent and easy to interpret. The boundedness result can be understood as a time-domain small-gain effect, where the delayed feedback ultimately counteracts any persistent growth in the system.

OCMar 31, 2025
Remarks on the Polyak-Lojasiewicz inequality and the convergence of gradient systems

Arthur Castello B. de Oliveira, Leilei Cui, Eduardo D. Sontag

This work explores generalizations of the Polyak-Lojasiewicz inequality (PLI) and their implications for the convergence behavior of gradient flows in optimization problems. Motivated by the continuous-time linear quadratic regulator (CT-LQR) policy optimization problem -- where only a weaker version of the PLI is characterized in the literature -- this work shows that while weaker conditions are sufficient for global convergence to, and optimality of the set of critical points of the cost function, the "profile" of the gradient flow solution can change significantly depending on which "flavor" of inequality the cost satisfies. After a general theoretical analysis, we focus on fitting the CT-LQR policy optimization problem to the proposed framework, showing that, in fact, it can never satisfy a PLI in its strongest form. We follow up our analysis with a brief discussion on the difference between continuous- and discrete-time LQR policy optimization, and end the paper with some intuition on the extension of this framework to optimization problems with L1 regularization and solved through proximal gradient flows.

LGSep 23, 2025
Modular Machine Learning with Applications to Genetic Circuit Composition

Jichi Wang, Eduardo D. Sontag, Domitilla Del Vecchio

In several applications, including in synthetic biology, one often has input/output data on a system composed of many modules, and although the modules' input/output functions and signals may be unknown, knowledge of the composition architecture can significantly reduce the amount of training data required to learn the system's input/output mapping. Learning the modules' input/output functions is also necessary for designing new systems from different composition architectures. Here, we propose a modular learning framework, which incorporates prior knowledge of the system's compositional structure to (a) identify the composing modules' input/output functions from the system's input/output data and (b) achieve this by using a reduced amount of data compared to what would be required without knowledge of the compositional structure. To achieve this, we introduce the notion of modular identifiability, which allows recovery of modules' input/output functions from a subset of the system's input/output data, and provide theoretical guarantees on a class of systems motivated by genetic circuits. We demonstrate the theory on computational studies showing that a neural network (NNET) that accounts for the compositional structure can learn the composing modules' input/output functions and predict the system's output on inputs outside of the training set distribution. By contrast, a neural network that is agnostic of the structure is unable to predict on inputs that fall outside of the training set distribution. By reducing the need for experimental data and allowing module identification, this framework offers the potential to ease the design of synthetic biological circuits and of multi-module systems more generally.

LGJul 14, 2025
Some remarks on gradient dominance and LQR policy optimization

Eduardo D. Sontag

Solutions of optimization problems, including policy optimization in reinforcement learning, typically rely upon some variant of gradient descent. There has been much recent work in the machine learning, control, and optimization communities applying the Polyak-Łojasiewicz Inequality (PLI) to such problems in order to establish an exponential rate of convergence (a.k.a. ``linear convergence'' in the local-iteration language of numerical analysis) of loss functions to their minima under the gradient flow. Often, as is the case of policy iteration for the continuous-time LQR problem, this rate vanishes for large initial conditions, resulting in a mixed globally linear / locally exponential behavior. This is in sharp contrast with the discrete-time LQR problem, where there is global exponential convergence. That gap between CT and DT behaviors motivates the search for various generalized PLI-like conditions, and this talk will address that topic. Moreover, these generalizations are key to understanding the transient and asymptotic effects of errors in the estimation of the gradient, errors which might arise from adversarial attacks, wrong evaluation by an oracle, early stopping of a simulation, inaccurate and very approximate digital twins, stochastic computations (algorithm ``reproducibility''), or learning by sampling from limited data. We describe an ``input to state stability'' (ISS) analysis of this issue. The second part discusses convergence and PLI-like properties of ``linear feedforward neural networks'' in feedback control. Much of the work described here was done in collaboration with Arthur Castello B. de Oliveira, Leilei Cui, Zhong-Ping Jiang, and Milad Siami.

LGMay 17, 2023
On the ISS Property of the Gradient Flow for Single Hidden-Layer Neural Networks with Linear Activations

Arthur Castello B. de Oliveira, Milad Siami, Eduardo D. Sontag

Recent research in neural networks and machine learning suggests that using many more parameters than strictly required by the initial complexity of a regression problem can result in more accurate or faster-converging models -- contrary to classical statistical belief. This phenomenon, sometimes known as ``benign overfitting'', raises questions regarding in what other ways might overparameterization affect the properties of a learning problem. In this work, we investigate the effects of overfitting on the robustness of gradient-descent training when subject to uncertainty on the gradient estimation. This uncertainty arises naturally if the gradient is estimated from noisy data or directly measured. Our object of study is a linear neural network with a single, arbitrarily wide, hidden layer and an arbitrary number of inputs and outputs. In this paper we solve the problem for the case where the input and output of our neural-network are one-dimensional, deriving sufficient conditions for robustness of our system based on necessary and sufficient conditions for convergence in the undisturbed case. We then show that the general overparametrized formulation introduces a set of spurious equilibria which lay outside the set where the loss function is minimized, and discuss directions of future work that might extend our current results for more general formulations.