LGOct 23, 2023
Inferring Relational Potentials in Interacting SystemsArmand Comas-Massagué, Yilun Du, Christian Fernandez et al. · mit
Systems consisting of interacting agents are prevalent in the world, ranging from dynamical systems in physics to complex biological networks. To build systems which can interact robustly in the real world, it is thus important to be able to infer the precise interactions governing such systems. Existing approaches typically discover such interactions by explicitly modeling the feed-forward dynamics of the trajectories. In this work, we propose Neural Interaction Inference with Potentials (NIIP) as an alternative approach to discover such interactions that enables greater flexibility in trajectory modeling: it discovers a set of relational potentials, represented as energy functions, which when minimized reconstruct the original trajectory. NIIP assigns low energy to the subset of trajectories which respect the relational constraints observed. We illustrate that with these representations NIIP displays unique capabilities in test-time. First, it allows trajectory manipulation, such as interchanging interaction types across separately trained models, as well as trajectory forecasting. Additionally, it allows adding external hand-crafted potentials at test-time. Finally, NIIP enables the detection of out-of-distribution samples and anomalies without explicit training. Website: https://energy-based-model.github.io/interaction-potentials.
SYJun 5, 2013
Probabilistic Optimal Estimation and Filtering under UncertaintyFabrizio Dabbene, Mario Sznaier, Roberto Tempo
The classical approach to system identification is based on stochastic assumptions about the measurement error, and provides estimates that have random nature. Worst-case identification, on the other hand, only assumes the knowledge of deterministic error bounds, and establishes guaranteed estimates, thus being in principle better suited for the use in control design. However, a main limitation of such deterministic bounds lies on their potential conservatism, thus leading to estimates of restricted use. In this paper, we propose a rapprochement between the stochastic and worst-case paradigms. In particular, based on a probabilistic framework for linear estimation problems, we derive new computational results. These results combine elements from information-based complexity with recent developments in the theory of randomized algorithms. The main idea in this line of research is to "discard" sets of measure at most ε, where εis a probabilistic accuracy, from the set of deterministic estimates. Therefore, we are decreasing the so-called worst-case radius of information at the expense of a given probabilistic ``risk." In this setting, we compute a trade-off curve, called violation function, which shows how the radius of information decreases as a function of the accuracy. To this end, we construct randomized and deterministic algorithms which provide approximations of this function. We report extensive simulations showing numerical comparisons between the stochastic, worst-case and probabilistic approaches, thus demonstrating the efficacy of the methods proposed in this paper.
SYNov 18, 2023
On the Hardness of Learning to Stabilize Linear SystemsXiong Zeng, Zexiang Liu, Zhe Du et al.
Inspired by the work of Tsiamis et al. \cite{tsiamis2022learning}, in this paper we study the statistical hardness of learning to stabilize linear time-invariant systems. Hardness is measured by the number of samples required to achieve a learning task with a given probability. The work in \cite{tsiamis2022learning} shows that there exist system classes that are hard to learn to stabilize with the core reason being the hardness of identification. Here we present a class of systems that can be easy to identify, thanks to a non-degenerate noise process that excites all modes, but the sample complexity of stabilization still increases exponentially with the system dimension. We tie this result to the hardness of co-stabilizability for this class of systems using ideas from robust control.
CVDec 14, 2022
SAIF: Sparse Adversarial and Imperceptible Attack FrameworkTooba Imtiaz, Morgan Kohler, Jared Miller et al.
Adversarial attacks hamper the decision-making ability of neural networks by perturbing the input signal. The addition of calculated small distortion to images, for instance, can deceive a well-trained image classification network. In this work, we propose a novel attack technique called Sparse Adversarial and Interpretable Attack Framework (SAIF). Specifically, we design imperceptible attacks that contain low-magnitude perturbations at a small number of pixels and leverage these sparse attacks to reveal the vulnerability of classifiers. We use the Frank-Wolfe (conditional gradient) algorithm to simultaneously optimize the attack perturbations for bounded magnitude and sparsity with $O(1/\sqrt{T})$ convergence. Empirical results show that SAIF computes highly imperceptible and interpretable adversarial examples, and outperforms state-of-the-art sparse attack methods on the ImageNet dataset.
25.9OCMar 27
Unsafe Probabilities and Risk Contours for Stochastic Processes using Convex OptimizationJared Miller, Matteo Tacchi, Didier Henrion et al.
This paper proposes an algorithm to calculate the maximal probability of unsafety with respect to trajectories of a stochastic process and a hazard set. The unsafe probability estimation problem is cast as a primal-dual pair of infinite-dimensional linear programs in occupation measures and continuous functions. This convex relaxation is nonconservative (to the true probability of unsafety) under compactness and regularity conditions in dynamics. The continuous-function linear program is linked to existing probability-certifying barrier certificates of safety. Risk contours for initial conditions of the stochastic process may be generated by suitably modifying the objective of the continuous-function program, forming an interpretable and visual representation of stochastic safety for test initial conditions. All infinite-dimensional linear programs are truncated to finite dimension by the Moment-Sum-of-Squares hierarchy of semidefinite programs. Unsafe-probability estimation and risk contours are generated for example stochastic processes.
85.0SYMay 14
Randomized Atomic Feature Models for Physics-Informed Identification of Dynamic SystemsRajiv Singh, Mario Sznaier, Lennart Ljung
We present a physics-informed framework for system identification based on randomized stable atomic features. Impulse responses are represented as random superpositions of stable atoms, namely damped complex exponentials associated with poles sampled inside a prescribed disk. Identification is then cast as a convex regularized least-squares problem with optional linear, second-order-cone, and KYP constraints. The approach generalizes random Fourier and random Laplace features to the damped, nonstationary regime relevant to engineering systems while retaining modal interpretability and scalable finite-dimensional computation. The main analytic point is an operator-theoretic Disk-Bochner viewpoint: positive measures over stable poles generate positive-definite kernels with a radius-dependent shift defect, while a converse scalar disk moment representation for an arbitrary kernel is characterized by subnormality of the canonical shift. We prove this statement, establish an RKHS-to-l1 embedding, show that sampled poles induce a valid finite atomic gauge, discuss random-feature convergence, and state sparse-recovery guarantees conditionally on the restricted-eigenvalue properties of the realized disk-Vandermonde or input-output design matrix. We also connect the normalized transfer function problem to Nevanlinna-Pick interpolation and LFT set-membership. The framework directly encodes stability margins, modal localization, DC-gain bounds, monotonicity, passivity, relative degree, settling-time targets, and time/frequency-domain error bounds. Numerical comparisons illustrate how physically meaningful priors can compensate for poor excitation and improve constrained impulse-response recovery in an under-informative data setting.
MLSep 6, 2019Code
Solving Interpretable Kernel Dimension ReductionChieh Wu, Jared Miller, Yale Chang et al.
Kernel dimensionality reduction (KDR) algorithms find a low dimensional representation of the original data by optimizing kernel dependency measures that are capable of capturing nonlinear relationships. The standard strategy is to first map the data into a high dimensional feature space using kernels prior to a projection onto a low dimensional space. While KDR methods can be easily solved by keeping the most dominant eigenvectors of the kernel matrix, its features are no longer easy to interpret. Alternatively, Interpretable KDR (IKDR) is different in that it projects onto a subspace \textit{before} the kernel feature mapping, therefore, the projection matrix can indicate how the original features linearly combine to form the new features. Unfortunately, the IKDR objective requires a non-convex manifold optimization that is difficult to solve and can no longer be solved by eigendecomposition. Recently, an efficient iterative spectral (eigendecomposition) method (ISM) has been proposed for this objective in the context of alternative clustering. However, ISM only provides theoretical guarantees for the Gaussian kernel. This greatly constrains ISM's usage since any kernel method using ISM is now limited to a single kernel. This work extends the theoretical guarantees of ISM to an entire family of kernels, thereby empowering ISM to solve any kernel method of the same objective. In identifying this family, we prove that each kernel within the family has a surrogate $Φ$ matrix and the optimal projection is formed by its most dominant eigenvectors. With this extension, we establish how a wide range of IKDR applications across different learning paradigms can be solved by ISM. To support reproducible results, the source code is made publicly available on \url{https://github.com/chieh-neu/ISM_supervised_DR}.
18.6OCMar 26
On incremental and semi-global exponential stability of gradient flows satisfying generalized Åojasiewicz inequalitiesAndreas Oliveira, Arthur C. B. de Oliveira, Mario Sznaier et al.
The Åojasiewicz inequality characterizes objective-value convergence along gradient flows and, in special cases, yields exponential decay of the cost. However, such results do not directly give rates of convergence in the state. In this paper, we use contraction theory to derive state-space guarantees for gradient systems satisfying generalized Åojasiewicz inequalities. We first show that, when the objective has a unique strongly convex minimizer, the generalized Åojasiewicz inequality implies semi-global exponential stability; on arbitrary compact subsets, this yields exponential stability. We then give two curvature-based sufficient conditions, together with constraints on the Åojasiewicz rate, under which the nonconvex gradient flow is globally incrementally exponentially stable.
CVApr 10, 2024
Solving Masked Jigsaw Puzzles with Diffusion Vision TransformersJinyang Liu, Wondmgezahu Teshome, Sandesh Ghimire et al.
Solving image and video jigsaw puzzles poses the challenging task of rearranging image fragments or video frames from unordered sequences to restore meaningful images and video sequences. Existing approaches often hinge on discriminative models tasked with predicting either the absolute positions of puzzle elements or the permutation actions applied to the original data. Unfortunately, these methods face limitations in effectively solving puzzles with a large number of elements. In this paper, we propose JPDVT, an innovative approach that harnesses diffusion transformers to address this challenge. Specifically, we generate positional information for image patches or video frames, conditioned on their underlying visual content. This information is then employed to accurately assemble the puzzle pieces in their correct positions, even in scenarios involving missing pieces. Our method achieves state-of-the-art performance on several datasets.
LGFeb 3, 2025
Generalization Error Analysis for Selective State-Space Models Through the Lens of AttentionArya Honarpisheh, Mustafa Bozdag, Octavia Camps et al.
State-space models (SSMs) have recently emerged as a compelling alternative to Transformers for sequence modeling tasks. This paper presents a theoretical generalization analysis of selective SSMs, the core architectural component behind the Mamba model. We derive a novel covering number-based generalization bound for selective SSMs, building upon recent theoretical advances in the analysis of Transformer models. Using this result, we analyze how the spectral abscissa of the continuous-time state matrix influences the model's stability during training and its ability to generalize across sequence lengths. We empirically validate our findings on a synthetic majority task, the IMDb sentiment classification benchmark, and the ListOps task, demonstrating how our theoretical insights translate into practical model behavior.
SYAug 23, 2025
Frequency Response Identification of Low-Order Systems: Finite-Sample AnalysisArya Honarpisheh, Mario Sznaier
This paper proposes a frequency-domain system identification method for learning low-order systems. The identification problem is formulated as the minimization of the l2 norm between the identified and measured frequency responses, with the nuclear norm of the Loewner matrix serving as a regularization term. This formulation results in an optimization problem that can be efficiently solved using standard convex optimization techniques. We derive an upper bound on the sampled-frequency complexity of the identification process and subsequently extend this bound to characterize the identification error over all frequencies. A detailed analysis of the sample complexity is provided, along with a thorough interpretation of its terms and dependencies. Finally, the efficacy of the proposed method is demonstrated through an example, and numerical simulations validating the growth rate of the sample complexity bound.
ROJul 12, 2025
Real-Time Adaptive Motion Planning via Point Cloud-Guided, Energy-Based Diffusion and Potential FieldsWondmgezahu Teshome, Kian Behzad, Octavia Camps et al.
Motivated by the problem of pursuit-evasion, we present a motion planning framework that combines energy-based diffusion models with artificial potential fields for robust real time trajectory generation in complex environments. Our approach processes obstacle information directly from point clouds, enabling efficient planning without requiring complete geometric representations. The framework employs classifier-free guidance training and integrates local potential fields during sampling to enhance obstacle avoidance. In dynamic scenarios, the system generates initial trajectories using the diffusion model and continuously refines them through potential field-based adaptation, demonstrating effective performance in pursuit-evasion scenarios with partial pursuer observability.
CVJun 4, 2024
3D-HGS: 3D Half-Gaussian SplattingHaolin Li, Jinyang Liu, Mario Sznaier et al.
Photo-realistic image rendering from 3D scene reconstruction has advanced significantly with neural rendering techniques. Among these, 3D Gaussian Splatting (3D-GS) outperforms Neural Radiance Fields (NeRFs) in quality and speed but struggles with shape and color discontinuities. We propose 3D Half-Gaussian (3D-HGS) kernels as a plug-and-play solution to address these limitations. Our experiments show that 3D-HGS enhances existing 3D-GS methods, achieving state-of-the-art rendering quality without compromising speed.
CVMay 2, 2023
Cross-view Action Recognition via Contrastive View-invariant RepresentationYuexi Zhang, Dan Luo, Balaji Sundareshan et al.
Cross view action recognition (CVAR) seeks to recognize a human action when observed from a previously unseen viewpoint. This is a challenging problem since the appearance of an action changes significantly with the viewpoint. Applications of CVAR include surveillance and monitoring of assisted living facilities where is not practical or feasible to collect large amounts of training data when adding a new camera. We present a simple yet efficient CVAR framework to learn invariant features from either RGB videos, 3D skeleton data, or both. The proposed approach outperforms the current state-of-the-art achieving similar levels of performance across input modalities: 99.4% (RGB) and 99.9% (3D skeletons), 99.4% (RGB) and 99.9% (3D Skeletons), 97.3% (RGB), and 99.2% (3D skeletons), and 84.4%(RGB) for the N-UCLA, NTU-RGB+D 60, NTU-RGB+D 120, and UWA3DII datasets, respectively.
CVOct 1, 2021
Self-Supervised Decomposition, Disentanglement and Prediction of Video Sequences while Interpreting Dynamics: A Koopman PerspectiveArmand Comas, Sandesh Ghimire, Haolin Li et al.
Human interpretation of the world encompasses the use of symbols to categorize sensory inputs and compose them in a hierarchical manner. One of the long-term objectives of Computer Vision and Artificial Intelligence is to endow machines with the capacity of structuring and interpreting the world as we do. Towards this goal, recent methods have successfully been able to decompose and disentangle video sequences into their composing objects and dynamics, in a self-supervised fashion. However, there has been a scarce effort in giving interpretation to the dynamics of the scene. We propose a method to decompose a video into moving objects and their attributes, and model each object's dynamics with linear system identification tools, by means of a Koopman embedding. This allows interpretation, manipulation and extrapolation of the dynamics of the different objects by employing the Koopman operator K. We test our method in various synthetic datasets and successfully forecast challenging trajectories while interpreting them.
CVJul 30, 2020
Key Frame Proposal Network for Efficient Pose Estimation in VideosYuexi Zhang, Yin Wang, Octavia Camps et al.
Human pose estimation in video relies on local information by either estimating each frame independently or tracking poses across frames. In this paper, we propose a novel method combining local approaches with global context. We introduce a light weighted, unsupervised, key frame proposal network (K-FPN) to select informative frames and a learned dictionary to recover the entire pose sequence from these frames. The K-FPN speeds up the pose estimation and provides robustness to bad frames with occlusion, motion blur, and illumination changes, while the learned dictionary provides global dynamic context. Experiments on Penn Action and sub-JHMDB datasets show that the proposed method achieves state-of-the-art accuracy, with substantial speed-up.
MLSep 8, 2019
Iterative Spectral Method for Alternative ClusteringChieh Wu, Stratis Ioannidis, Mario Sznaier et al.
Given a dataset and an existing clustering as input, alternative clustering aims to find an alternative partition. One of the state-of-the-art approaches is Kernel Dimension Alternative Clustering (KDAC). We propose a novel Iterative Spectral Method (ISM) that greatly improves the scalability of KDAC. Our algorithm is intuitive, relies on easily implementable spectral decompositions, and comes with theoretical guarantees. Its computation time improves upon existing implementations of KDAC by as much as 5 orders of magnitude.
MLSep 6, 2019
Spectral Non-Convex Optimization for Dimension Reduction with Hilbert-Schmidt Independence CriterionChieh Wu, Jared Miller, Yale Chang et al.
The Hilbert Schmidt Independence Criterion (HSIC) is a kernel dependence measure that has applications in various aspects of machine learning. Conveniently, the objectives of different dimensionality reduction applications using HSIC often reduce to the same optimization problem. However, the nonconvexity of the objective function arising from non-linear kernels poses a serious challenge to optimization efficiency and limits the potential of HSIC-based formulations. As a result, only linear kernels have been computationally tractable in practice. This paper proposes a spectral-based optimization algorithm that extends beyond the linear kernel. The algorithm identifies a family of suitable kernels and provides the first and second-order local guarantees when a fixed point is reached. Furthermore, we propose a principled initialization strategy, thereby removing the need to repeat the algorithm at random initialization points. Compared to state-of-the-art optimization algorithms, our empirical results on real data show a run-time improvement by as much as a factor of $10^5$ while consistently achieving lower cost and classification/clustering errors. The implementation source code is publicly available on https://github.com/endsley.
CVMar 20, 2018
DYAN: A Dynamical Atoms-Based Network for Video PredictionWenqian Liu, Abhishek Sharma, Octavia Camps et al.
The ability to anticipate the future is essential when making real time critical decisions, provides valuable information to understand dynamic natural scenes, and can help unsupervised video representation learning. State-of-art video prediction is based on LSTM recursive networks and/or generative adversarial network learning. These are complex architectures that need to learn large numbers of parameters, are potentially hard to train, slow to run, and may produce blurry predictions. In this paper, we introduce DYAN, a novel network with very few parameters and easy to train, which produces accurate, high quality frame predictions, significantly faster than previous approaches. DYAN owes its good qualities to its encoder and decoder, which are designed following concepts from systems identification theory and exploit the dynamics-based invariants of the data. Extensive experiments using several standard video datasets show that DYAN is superior generating frames and that it generalizes well across domains.
CVFeb 20, 2018
MoNet: Moments Embedding NetworkMengran Gou, Fei Xiong, Octavia Camps et al.
Bilinear pooling has been recently proposed as a feature encoding layer, which can be used after the convolutional layers of a deep network, to improve performance in multiple vision tasks. Different from conventional global average pooling or fully connected layer, bilinear pooling gathers 2nd order information in a translation invariant fashion. However, a serious drawback of this family of pooling layers is their dimensionality explosion. Approximate pooling methods with compact properties have been explored towards resolving this weakness. Additionally, recent results have shown that significant performance gains can be achieved by adding 1st order information and applying matrix normalization to regularize unstable higher order information. However, combining compact pooling with matrix normalization and other order information has not been explored until now. In this paper, we unify bilinear pooling and the global Gaussian embedding layers through the empirical moment matrix. In addition, we propose a novel sub-matrix square-root layer, which can be used to normalize the output of the convolution layer directly and mitigate the dimensionality problem with off-the-shelf compact pooling methods. Our experiments on three widely used fine-grained classification datasets illustrate that our proposed architecture, MoNet, can achieve similar or better performance than with the state-of-art G2DeNet. Furthermore, when combined with compact pooling technique, MoNet obtains comparable performance with encoded features with 96% less dimensions.
CVSep 20, 2017
Multi-camera Multi-Object TrackingWenqian Liu, Octavia Camps, Mario Sznaier
In this paper, we propose a pipeline for multi-target visual tracking under multi-camera system. For multi-camera system tracking problem, efficient data association across cameras, and at the same time, across frames becomes more important than single-camera system tracking. However, most of the multi-camera tracking algorithms emphasis on single camera across frame data association. Thus in our work, we model our tracking problem as a global graph, and adopt Generalized Maximum Multi Clique optimization problem as our core algorithm to take both across frame and across camera data correlation into account all together. Furthermore, in order to compute good similarity scores as the input of our graph model, we extract both appearance and dynamic motion similarities. For appearance feature, Local Maximal Occurrence Representation(LOMO) feature extraction algorithm for ReID is conducted. When it comes to capturing the dynamic information, we build Hankel matrix for each tracklet of target and apply rank estimation with Iterative Hankel Total Least Squares(IHTLS) algorithm to it. We evaluate our tracker on the challenging Terrace Sequences from EPFL CVLAB as well as recently published Duke MTMC dataset.
CVApr 1, 2016
Person Re-identification in Appearance Impaired ScenariosMengran Gou, Xikang Zhang, Angels Rates-Borras et al.
Person re-identification is critical in surveillance applications. Current approaches rely on appearance based features extracted from a single or multiple shots of the target and candidate matches. These approaches are at a disadvantage when trying to distinguish between candidates dressed in similar colors or when targets change their clothing. In this paper we propose a dynamics-based feature to overcome this limitation. The main idea is to capture soft biometrics from gait and motion patterns by gathering dense short trajectories (tracklets) which are Fisher vector encoded. To illustrate the merits of the proposed features we introduce three new "appearance-impaired" datasets. Our experiments on the original and the appearance impaired datasets demonstrate the benefits of incorporating dynamics-based information with appearance-based information to re-identification algorithms.
OCApr 3, 2015
Robust Anomaly Detection Using Semidefinite ProgrammingJose A. Lopez, Octavia Camps, Mario Sznaier
This paper presents a new approach, based on polynomial optimization and the method of moments, to the problem of anomaly detection. The proposed technique only requires information about the statistical moments of the normal-state distribution of the features of interest and compares favorably with existing approaches (such as Parzen windows and 1-class SVM). In addition, it provides a succinct description of the normal state. Thus, it leads to a substantial simplification of the the anomaly detection problem when working with higher dimensional datasets.