Jeff S. Shamma

h-index49

13papers

88citations

Novelty45%

AI Score46

Ranked #38,180 of 194,257 authors (top 20%)#178 in SY (top 11%)

13 Papers

1.2SYApr 11, 2018

Herdable Systems Over Signed, Directed Graphs

Sebastian F. Ruf, Magnus Egerstedt, Jeff S. Shamma

This paper considers the notion of herdability, a set-based reachability condition, which asks whether the state of a system can be controlled to be element-wise larger than a non-negative threshold. The basic theory of herdable systems is presented, including a necessary and sufficient condition for herdability. This paper then considers the impact of the underlying graph structure of a linear system on the herdability of the system, for the case where the graph is represented as signed and directed. By classifying nodes based on the length and sign of walks from an input, we find a class of completely herdable systems as well as provide a complete characterization of nodes that can be herded in systems with an underlying graph that is a directed out-branching rooted at a single input.

1.2SYApr 28, 2018

Herding Positive, Complex Networks

Sebastian F. Ruf, Magnus Egersted, Jeff S. Shamma

The problem of controlling complex networks is of interest to disciplines ranging from biology to swarm robotics. However, controllability can be too strict a condition, failing to capture a range of desirable behaviors. Herdability, which describes the ability to drive a system to a specific set in the state space, was recently introduced as an alternative network control notion. This paper considers the application of herdability to the study of complex networks under the assumption that a positive system evolves on the network. The herdability of a class of networked systems is investigated and two problems related to ensuring system herdability are explored. The first is the input addition problem, which investigates which nodes in a network should receive inputs to ensure that the system is herdable. The second is a related problem of selecting the best single node from which to herd the network, in the case that a single node is guaranteed to make the system is herdable. In order to select the best herding node, a novel control energy based herdability centrality measure is introduced.

5.7MAMay 4

Higher-Order Uncoupled Learning Dynamics and Nash Equilibrium

Sarah A. Toonsi, Jeff S. Shamma

We study learnability of mixed-strategy Nash Equilibrium (NE) in general finite games using higher-order replicator dynamics as well as classes of higher-order uncoupled heterogeneous dynamics. In higher-order uncoupled learning dynamics, players have no access to utilities of opponents (uncoupled) but are allowed to use auxiliary states to further process information (higher-order). We establish a link between uncoupled learning and feedback stabilization with decentralized control. Using this association, we show that for any finite game with an isolated completely mixed-strategy NE, there exist higher-order uncoupled learning dynamics that lead (locally) to that NE. We further establish the lack of universality of learning dynamics by linking learning to the control theoretic concept of simultaneous stabilization. We construct two games such that any higher-order dynamics that learn the completely mixed-strategy NE of one of these games can never learn the completely mixed-strategy NE of the other. Next, motivated by imposing natural restrictions on allowable learning dynamics, we introduce the Asymptotic Best Response (ABR) property. Dynamics with the ABR property asymptotically learn a best response in environments that are asymptotically stationary. We show that the ABR property relates to an internal stability condition on higher-order learning dynamics. We provide conditions under which NE are compatible with the ABR property. Finally, we address learnability of mixed-strategy NE in the bandit setting using a bandit version of higher-order replicator dynamics.

6.1SYMar 18

Convergence of Payoff-Based Higher-Order Replicator Dynamics in Contractive Games

Hassan Abdelraouf, Vijay Gupta, Jeff S. Shamma

We study the convergence properties of a payoff-based higher-order version of replicator dynamics, a widely studied model in evolutionary dynamics and game-theoretic learning, in contractive games. Recent work has introduced a control-theoretic perspective for analyzing the convergence of learning dynamics through passivity theory, leading to a classification of learning dynamics based on the passivity notion they satisfy, such as \textdelta-passivity, equilibrium-independent passivity, and incremental passivity. We leverage this framework for the study of higher-order replicator dynamics for contractive games, which form the complement of passive learning dynamics. Standard replicator dynamics can be represented as a cascade interconnection between an integrator and the softmax mapping. Payoff-based higher-order replicator dynamics include a linear time-invariant (LTI) system in parallel with the existing integrator. First, we show that if this added system is strictly passive and asymptotically stable, then the resulting learning dynamics converge locally to the Nash equilibrium in contractive games. Second, we establish global convergence properties using incremental stability analysis for the special case of symmetric matrix contractive games.

7.1LGSep 28, 2025

Optimism as Risk-Seeking in Multi-Agent Reinforcement Learning

Runyu Zhang, Na Li, Asuman Ozdaglar et al.

Risk sensitivity has become a central theme in reinforcement learning (RL), where convex risk measures and robust formulations provide principled ways to model preferences beyond expected return. Recent extensions to multi-agent RL (MARL) have largely emphasized the risk-averse setting, prioritizing robustness to uncertainty. In cooperative MARL, however, such conservatism often leads to suboptimal equilibria, and a parallel line of work has shown that optimism can promote cooperation. Existing optimistic methods, though effective in practice, are typically heuristic and lack theoretical grounding. Building on the dual representation for convex risk measures, we propose a principled framework that interprets risk-seeking objectives as optimism. We introduce optimistic value functions, which formalize optimism as divergence-penalized risk-seeking evaluations. Building on this foundation, we derive a policy-gradient theorem for optimistic value functions, including explicit formulas for the entropic risk/KL-penalty setting, and develop decentralized optimistic actor-critic algorithms that implement these updates. Empirical results on cooperative benchmarks demonstrate that risk-seeking optimism consistently improves coordination over both risk-neutral baselines and heuristic optimistic methods. Our framework thus unifies risk-sensitive learning and optimism, offering a theoretically grounded and practically effective approach to cooperation in MARL.

8.6SYOct 31, 2021

Safe Adaptive Learning-based Control for Constrained Linear Quadratic Regulators with Regret Guarantees

Yingying Li, Subhro Das, Jeff Shamma et al.

We study the adaptive control of an unknown linear system with a quadratic cost function subject to safety constraints on both the states and actions. The challenges of this problem arise from the tension among safety, exploration, performance, and computation. To address these challenges, we propose a polynomial-time algorithm that guarantees feasibility and constraint satisfaction with high probability under proper conditions. Our algorithm is implemented on a single trajectory and does not require system restarts. Further, we analyze the regret of our learning algorithm compared to the optimal safe linear controller with known model information. The proposed algorithm can achieve a $\tilde O(T^{2/3})$ regret, where $T$ is the number of stages and $\tilde O(\cdot)$ absorbs some logarithmic terms of $T$.

4.1ROJun 12, 2020Code

RISCuer: A Reliable Multi-UAV Search and Rescue Testbed

Mohamed Abdelkader, Usman A. Fiaz, Noureddine Toumi et al.

We present the Robotics Intelligent Systems & Control (RISC) Lab multiagent testbed for reliable search and rescue and aerial transport in outdoor environments. The system consists of a team of three multirotor unmanned aerial vehicles (UAVs), which are capable of autonomously searching, picking up, and transporting randomly distributed objects in an outdoor field. The method involves vision based object detection and localization, passive aerial grasping with our novel design, GPS based UAV navigation, and safe release of the objects at the drop zone. Our cooperative strategy ensures safe spatial separation between UAVs at all times and we prevent any conflicts at the drop zone using communication enabled consensus. All computation is performed onboard each UAV. We describe the complete software and hardware architecture for the system and demonstrate its reliable performance using comprehensive outdoor experiments, and by comparing our results with some recent, similar works.

1.2GTApr 25, 2019

Smart Jammer and LTE Network Strategies in An Infinite-Horizon Zero-Sum Repeated Game with Asymmetric and Incomplete Information

Farhan M. Aziz, Lichun Li, Jeff S. Shamma et al.

LTE/LTE-Advanced networks are known to be vulnerable to denial-of-service and loss-of-service attacks from smart jammers. In this article, the interaction between a smart jammer and LTE network is modeled as an infinite-horizon, zero-sum, asymmetric repeated game. The smart jammer and eNode B are modeled as the informed and the uninformed player, respectively. The main purpose of this article is to construct efficient suboptimal strategies for both players that can be used to solve the above-mentioned infinite-horizon repeated game with asymmetric and incomplete information. It has been shown in game-theoretic literature that security strategies provide optimal solution in zero-sum games. It is also shown that both players' security strategies in an infinite-horizon asymmetric game depend only on the history of the informed player's actions. However, fixed-sized sufficient statistics are needed for both players to solve the above-mentioned game efficiently. The smart jammer uses its evolving belief state as the fixed-sized sufficient statistics for the repeated game. Whereas, the LTE network (uninformed player) uses worst-case regret of its security strategy and its anti-discounted update as the fixed-sized sufficient statistics. Although fixed-sized sufficient statistics are employed by both players, optimal security strategy computation in λ-discounted asymmetric games is still hard to perform because of non-convexity. Hence, the problem is convexified in this article by devising `approximated' security strategies for both players that are based on approximated optimal game value. However, `approximated' strategies require full monitoring. Therefore, a simplistic yet effective `expected' strategy is also constructed for the LTE network that does not require full monitoring. The simulation results show that the smart jammer plays non-revealing and misleading strategies.

1.2SYApr 17, 2019

Herdability of Linear Systems Based on Sign Patterns and Graph Structures

Sebastian F Ruf, Magnus Egerstedt, Jeff S. Shamma

We consider the notion of herdability, a set-based reachability condition, which asks whether the state of a system can be controlled to be element-wise larger than a non-negative threshold. First a number of foundational results on herdability of a continuous time, linear time invariant system are presented. These show that the herdability of a linear system can be determined based on certain matrices, such as the controllability matrix, which arise in the study of controllability of linear systems. Second, the relationship between the sign pattern of the underlying graph structure of a system and the herdability properties of the system is investigated. In doing so the notion of sign herdability is introduced which captures classes of systems whose sign pattern determines their herdability. We identify a set of conditions, first on the sign pattern of the controllability matrix and then on the underlying graph structure, that ensure that the system is sign herdable.

3.5ROJan 6, 2019

usBot: A Modular Robotic Testbed for Programmable Self-Assembly

Usman A. Fiaz, Jeff S. Shamma

We present the design, characterization, and experimental results for a new modular robotic system for programmable self-assembly. The proposed system uses the Hybrid Cube Model (HCM), which integrates classical features from both deterministic and stochastic self-organization models. Thus, for instance, the modules are passive as far as their locomotion is concerned (stochastic), and yet they possess an active undocking routine (deterministic). The robots are constructed entirely from readily accessible components and unlike many existing robots, their excitation is not fluid mediated. Instead, the actuation setup is a solid state, independently programmable and highly portable platform. The system is capable of demonstrating fully autonomous and distributed stochastic self-assembly in two dimensions. It is shown to emulate the performance of several existing modular systems and promises to be a substantial effort towards developing a universal testbed for programmable self-assembly.

2.9ROSep 21, 2018

Infrastructure-free Localization of Aerial Robots with Ultrawideband Sensors

Samet Guler, Mohamed Abdelkader, Jeff S. Shamma

Robots in a swarm take advantage of a motion capture system or GPS sensors to obtain their global position. However, motion capture systems are environment-dependent and GPS sensors are not reliable in occluded environments. For a reliable and versatile operation in a swarm, robots must sense each other and interact locally. Motivated by this requirement, here we propose an on-board localization framework for multi-robot systems. Our framework consists of an anchor robot with three ultrawideband (UWB) sensors and a tag robot with a single UWB sensor. The anchor robot utilizes the three UWB sensors as a localization infrastructure and estimates the tag robot's location by using its on-board sensing and computational capabilities solely, without explicit inter-robot communication. We utilize a dual Monte-Carlo localization approach to capture the agile maneuvers of the tag robot with an acceptable precision. We validate the effectiveness of our algorithm with simulations and indoor and outdoor experiments on a two-drone setup. The proposed dual MCL algorithm yields highly accurate estimates for various speed profiles of the tag robot and demonstrates a superior performance over the standard particle filter and the extended Kalman Filter.

0.8LGApr 8, 2018

Path to Stochastic Stability: Comparative Analysis of Stochastic Learning Dynamics in Games

Hassan Jaleel, Jeff S. Shamma

Stochastic stability is a popular solution concept for stochastic learning dynamics in games. However, a critical limitation of this solution concept is its inability to distinguish between different learning rules that lead to the same steady-state behavior. We address this limitation for the first time and develop a framework for the comparative analysis of stochastic learning dynamics with different update rules but same steady-state behavior. We present the framework in the context of two learning dynamics: Log-Linear Learning (LLL) and Metropolis Learning (ML). Although both of these dynamics have the same stochastically stable states, LLL and ML correspond to different behavioral models for decision making. Moreover, we demonstrate through an example setup of sensor coverage game that for each of these dynamics, the paths to stochastically stable states exhibit distinctive behaviors. Therefore, we propose multiple criteria to analyze and quantify the differences in the short and medium run behavior of stochastic learning dynamics. We derive and compare upper bounds on the expected hitting time to the set of Nash equilibria for both LLL and ML. For the medium to long-run behavior, we identify a set of tools from the theory of perturbed Markov chains that result in a hierarchical decomposition of the state space into collections of states called cycles. We compare LLL and ML based on the proposed criteria and develop invaluable insights into the comparative behavior of the two dynamics.

1.2MASep 2, 2015

A Game-theoretic Formulation of the Homogeneous Self-Reconfiguration Problem

Daniel Pickem, Magnus Egerstedt, Jeff S. Shamma

In this paper we formulate the homogeneous two- and three-dimensional self-reconfiguration problem over discrete grids as a constrained potential game. We develop a game-theoretic learning algorithm based on the Metropolis-Hastings algorithm that solves the self-reconfiguration problem in a globally optimal fashion. Both a centralized and a fully distributed algorithm are presented and we show that the only stochastically stable state is the potential function maximizer, i.e. the desired target configuration. These algorithms compute transition probabilities in such a way that even though each agent acts in a self-interested way, the overall collective goal of self-reconfiguration is achieved. Simulation results confirm the feasibility of our approach and show convergence to desired target configurations.