MLDec 16, 2025
From STLS to Projection-based Dictionary Selection in Sparse Regression for System IdentificationHangjun Cho, Fabio V. G. Amaral, Andrei A. Klishin et al.
In this work, we revisit dictionary-based sparse regression, in particular, Sequential Threshold Least Squares (STLS), and propose a score-guided library selection to provide practical guidance for data-driven modeling, with emphasis on SINDy-type algorithms. STLS is an algorithm to solve the $\ell_0$ sparse least-squares problem, which relies on splitting to efficiently solve the least-squares portion while handling the sparse term via proximal methods. It produces coefficient vectors whose components depend on both the projected reconstruction errors, here referred to as the scores, and the mutual coherence of dictionary terms. The first contribution of this work is a theoretical analysis of the score and dictionary-selection strategy. This could be understood in both the original and weak SINDy regime. Second, numerical experiments on ordinary and partial differential equations highlight the effectiveness of score-based screening, improving both accuracy and interpretability in dynamical system identification. These results suggest that integrating score-guided methods to refine the dictionary more accurately may help SINDy users in some cases to enhance their robustness for data-driven discovery of governing equations.
STAT-MECHJul 21, 2023
Data-Induced Interactions of Sparse Sensors Using Statistical PhysicsAndrei A. Klishin, J. Nathan Kutz, Krithika Manohar
Large-dimensional empirical data in science and engineering frequently have a low-rank structure and can be represented as a combination of just a few eigenmodes. Because of this structure, we can use just a few spatially localized sensor measurements to reconstruct the full state of a complex system. The quality of this reconstruction, especially in the presence of sensor noise, depends significantly on the spatial configuration of the sensors. Multiple algorithms based on gappy interpolation and QR factorization have been proposed to optimize sensor placement. Here, instead of an algorithm that outputs a single "optimal" sensor configuration, we take a statistical mechanics view to compute the full landscape of sensor interactions induced by the training data. The two key advances of this paper are the recasting of the sensor placement landscape in an Ising model form and a regularized reconstruction that significantly decreases reconstruction error for few sensors. In addition, we provide first uncertainty quantification of the sparse sensing reconstruction and open questions about the shape of reconstruction risk curve. Mapping out these data-induced sensor interactions allows combining them with external selection criteria and anticipating sensor replacement impacts.
STAT-MECHMar 4, 2024
Statistical Mechanics of Dynamical System IdentificationAndrei A. Klishin, Joseph Bakarji, J. Nathan Kutz et al.
Recovering dynamical equations from observed noisy data is the central challenge of system identification. We develop a statistical mechanics approach to analyze sparse equation discovery algorithms, which typically balance data fit and parsimony via hyperparameter tuning. In this framework, statistical mechanics offers tools to analyze the interplay between complexity and fitness similarly to that of entropy and energy in physical systems. To establish this analogy, we define the hyperparameter optimization procedure as a two-level Bayesian inference problem that separates variable selection from coefficient inference and enables the computation of the posterior parameter distribution in closed form. Our approach provides uncertainty quantification, crucial in the low-data limit that is frequently encountered in real-world applications. A key advantage of employing statistical mechanical concepts, such as free energy and the partition function, is to connect the large data limit to thermodynamic limit and characterize the sparsity- and noise-induced phase transitions that delineate correct from incorrect identification. We thus provide a method for closed-loop inference, estimating the noise in a given model and checking if the model is tolerant to that noise amount. This perspective of sparse equation discovery is versatile and can be adapted to various other equation discovery algorithms.