Cédric Langbort

h-index25

18papers

2,708citations

Novelty40%

AI Score46

Ranked #37,614 of 194,257 authors (top 19%)#78 in GT (top 21%)

18 Papers

1.2GTSep 29, 2017

Strategic Communication Between Prospect Theoretic Agents over a Gaussian Test Channel

Venkata Sriram Siddhardh Nadendla, Emrah Akyol, Cedric Langbort et al.

In this paper, we model a Stackelberg game in a simple Gaussian test channel where a human transmitter (leader) communicates a source message to a human receiver (follower). We model human decision making using prospect theory models proposed for continuous decision spaces. Assuming that the value function is the squared distortion at both the transmitter and the receiver, we analyze the effects of the weight functions at both the transmitter and the receiver on optimal communication strategies, namely encoding at the transmitter and decoding at the receiver, in the Stackelberg sense. We show that the optimal strategies for the behavioral agents in the Stackelberg sense are identical to those designed for unbiased agents. At the same time, we also show that the prospect-theoretic distortions at both the transmitter and the receiver are both larger than the expected distortion, thus making behavioral agents less contended than unbiased agents. Consequently, the presence of cognitive biases increases the need for transmission power in order to achieve a given distortion at both transmitter and receiver.

4.6OCDec 21, 2011

Decentralized Disturbance Accommodation with Limited Plant Model Information

F. Farokhi, C. Langbort, K. H. Johansson

The design of optimal disturbance accommodation and servomechanism controllers with limited plant model information is considered in this paper. Their closed-loop performance are compared using a performance metric called competitive ratio which is the worst-case ratio of the cost of a given control design strategy to the cost of the optimal control design with full model information. It was recently shown that when it comes to designing optimal centralized or partially structured decentralized state-feedback controllers with limited model information, the best control design strategy in terms of competitive ratio is a static one. This is true even though the optimal structured decentralized state-feedback controller with full model information is dynamic. In this paper, we show that, in contrast, the best limited model information control design strategy for the disturbance accommodation problem gives a dynamic controller. We find an explicit minimizer of the competitive ratio and we show that it is undominated, that is, there is no other control design strategy that performs better for all possible plants while having the same worst-case ratio. This optimal controller can be separated into a static feedback law and a dynamic disturbance observer. For constant disturbances, it is shown that this structure corresponds to proportional-integral control.

4.6OCMar 13, 2012

Optimal Disturbance Accommodation with Limited Model Information

F. Farokhi, C. Langbort, K. H. Johansson

The design of optimal dynamic disturbance accommodation controller with limited model information is considered. We adapt the family of limited model information control design strategies, defined earlier by the authors, to handle dynamic controllers. This family of limited model information design strategies construct subcontrollers distributively by accessing only local plant model information. The closed-loop performance of the dynamic controllers that they can produce are studied using a performance metric called the competitive ratio which is the worst case ratio of the cost a control design strategy to the cost of the optimal control design with full model information.

1.2GTSep 17, 2012

Nash Equilibria for Stochastic Games with Asymmetric Information-Part 1: Finite Games

Ashutosh Nayyar, Abhishek Gupta, Cédric Langbort et al.

A model of stochastic games where multiple controllers jointly control the evolution of the state of a dynamic system but have access to different information about the state and action processes is considered. The asymmetry of information among the controllers makes it difficult to compute or characterize Nash equilibria. Using common information among the controllers, the game with asymmetric information is shown to be equivalent to another game with symmetric information. Further, under certain conditions, a Markov state is identified for the equivalent symmetric information game and its Markov perfect equilibria are characterized. This characterization provides a backward induction algorithm to find Nash equilibria of the original game with asymmetric information in pure or behavioral strategies. Each step of this algorithm involves finding Bayesian Nash equilibria of a one-stage Bayesian game. The class of Nash equilibria of the original game that can be characterized in this backward manner are named common information based Markov perfect equilibria.

5.4AIJun 24, 2023Code

Pointwise-in-Time Explanation for Linear Temporal Logic Rules

Noel Brindise, Cedric Langbort

The new field of Explainable Planning (XAIP) has produced a variety of approaches to explain and describe the behavior of autonomous agents to human observers. Many summarize agent behavior in terms of the constraints, or ''rules,'' which the agent adheres to during its trajectories. In this work, we narrow the focus from summary to specific moments in individual trajectories, offering a ''pointwise-in-time'' view. Our novel framework, which we define on Linear Temporal Logic (LTL) rules, assigns an intuitive status to any rule in order to describe the trajectory progress at individual time steps; here, a rule is classified as active, satisfied, inactive, or violated. Given a trajectory, a user may query for status of specific LTL rules at individual trajectory time steps. In this paper, we present this novel framework, named Rule Status Assessment (RSA), and provide an example of its implementation. We find that pointwise-in-time status assessment is useful as a post-hoc diagnostic, enabling a user to systematically track the agent's behavior with respect to a set of rules.

6.6OCApr 20

Steady-state Based Approach to Online Non-stochastic Control

Vijeth Hebbar, Spencer Hutchinson, Mahnoosh Alizadeh et al.

We study the problem of online non-stochastic control (ONC), which is the control of a linear system under adversarial disturbances and adversarial cost functions, with the aim of minimizing the total cost incurred. A recent line of literature in ONC develops algorithms that enjoy sublinear regret with respect to a benchmark based on the set of steady-states that are attainable by a constant input. In this work, we extend this research direction by giving an algorithm that enjoys $\mathcal{O}(\sqrt{T})$ regret with respect to a richer benchmark set, namely the set of steady-states attainable under an \emph{affine controller}. Since this benchmark substantially broadens the comparison class, it provides significantly stronger performance guarantees. Our proposed algorithm combines a Follow-The-Perturbed-Leader-style online non-convex optimization approach with a batching method that maintains stability despite changing policies. Although our proposed algorithm requires solving non-convex subproblems, we show that an approximate solution to this subproblem is sufficient to ensure $\mathcal{O}(\sqrt{T})$ regret. Furthermore, numerical experiments show that our algorithm enjoys lower total cost and similar computation to existing methods in certain settings.

7.1LGApr 18

Live LTL Progress Tracking: Towards Task-Based Exploration

Noel Brindise, Cedric Langbort, Melkior Ornik

Motivated by the challenge presented by non-Markovian objectives in reinforcement learning (RL), we present a novel framework to track and represent the progress of autonomous agents through complex, multi-stage tasks. Given a specification in finite linear temporal logic (LTL), the framework establishes a 'tracking vector' which updates at each time step in a trajectory rollout. The values of the vector represent the status of the specification as the trajectory develops, assigning true, false, or 'open' labels (where 'open' is used for indeterminate cases). Applied to an LTL formula tree, the tracking vector can be used to encode detailed information about how a task is executed over a trajectory, providing a potential tool for new performance metrics, diverse exploration, and reward shaping. In this paper, we formally present the framework and algorithm, collectively named Live LTL Progress Tracking, give a simple working example, and demonstrate avenues for its integration into RL models. Future work will apply the framework to problems such as task-space exploration and diverse solution-finding in RL.

3.8SYJun 20

Regret-Guaranteed Safe Switching: LQR Setting with Unknown Dynamics

Jafar Abbaszadeh Chekan, S. Rasoul Etesami, Cedric Langbort

We consider learning-based control in LQR setting, where the parameters associated with each mode are a priori unknown. The next mode to be activated is revealed online only at the time of switching. The objective is to determine both the switching times and the control gains for each mode such that (1) the norm of the system state remains bounded according to a prescribed criterion, and (2) the accumulated cost is minimized. To formalize the state-norm requirement, we introduce the notion of $(α,β)$-controllability for given parameters $α$ and $β$. We first study the problem in a known model setting and show that, under the switching mechanism described above and under the assumption that each mode is visited infinitely often, the strategy that minimizes the average expected cost consists of applying, in each mode, the feedback gain obtained from the solution of the discrete algebraic Riccati equation, while selecting dwell times that sufficiently satisfy the controllability condition. We refer to this strategy as the benchmark policy. Next, we propose an algorithm for the unknown-model setting that minimizes the regret, defined as the difference between the cumulative cost incurred by the online algorithm and that of the offline benchmark. By accurately estimating dwell-time errors, our method achieves an expected regret of $\mathcal{O}(|\mathcal{M}|^{1/4} n_s^{3/4} + n_m)$, where $n_s$ denotes the number of switches, $|\mathcal{M}|$ is the number of modes, and $n_m$ is the number of malignant switches.

2.0LGDec 11, 2023Code

Online Decision Making with History-Average Dependent Costs (Extended)

Vijeth Hebbar, Cedric Langbort

In many online sequential decision-making scenarios, a learner's choices affect not just their current costs but also the future ones. In this work, we look at one particular case of such a situation where the costs depend on the time average of past decisions over a history horizon. We first recast this problem with history dependent costs as a problem of decision making under stage-wise constraints. To tackle this, we then propose the novel Follow-The-Adaptively-Regularized-Leader (FTARL) algorithm. Our innovative algorithm incorporates adaptive regularizers that depend explicitly on past decisions, allowing us to enforce stage-wise constraints while simultaneously enabling us to establish tight regret bounds. We also discuss the implications of the length of history horizon on design of no-regret algorithms for our problem and present impossibility results when it is the full learning horizon.

7.1LGJun 11, 2025

"What are my options?": Explaining RL Agents with Diverse Near-Optimal Alternatives (Extended)

Noel Brindise, Vijeth Hebbar, Riya Shah et al.

In this work, we provide an extended discussion of a new approach to explainable Reinforcement Learning called Diverse Near-Optimal Alternatives (DNA), first proposed at L4DC 2025. DNA seeks a set of reasonable "options" for trajectory-planning agents, optimizing policies to produce qualitatively diverse trajectories in Euclidean space. In the spirit of explainability, these distinct policies are used to "explain" an agent's options in terms of available trajectory shapes from which a human user may choose. In particular, DNA applies to value function-based policies on Markov decision processes where agents are limited to continuous trajectories. Here, we describe DNA, which uses reward shaping in local, modified Q-learning problems to solve for distinct policies with guaranteed epsilon-optimality. We show that it successfully returns qualitatively different policies that constitute meaningfully different "options" in simulation, including a brief comparison to related approaches in the stochastic optimization field of Quality Diversity. Beyond the explanatory motivation, this work opens new possibilities for exploration and adaptive planning in RL.

3.1MLJun 11, 2024

Any-Time Regret-Guaranteed Algorithm for Control of Linear Quadratic Systems

Jafar Abbaszadeh Chekan, Cedric Langbort

We propose a computationally efficient algorithm that achieves anytime regret of order $\mathcal{O}(\sqrt{t})$, with explicit dependence on the system dimensions and on the solution of the Discrete Algebraic Riccati Equation (DARE). Our approach builds on the SDP-based framework of \cite{cohen2019learning}, using an appropriately tuned regularization and a sufficiently accurate initial estimate to construct confidence ellipsoids for control design. A carefully designed input-perturbation mechanism is incorporated to ensure anytime performance. We develop two variants of the algorithm. The first enforces a notion of strong sequential stability, requiring each policy to be stabilizing and successive policies to remain close. However, enforcing this notion results in a suboptimal regret scaling. The second removes the sequential-stability requirement and instead requires only that each generated policy be stabilizing. Closed-loop stability is then preserved through a dwell-time-inspired policy-update rule, adapting ideas from switched-systems control to carefully balance exploration and exploitation. This class of algorithms also addresses key shortcomings of most existing approaches including certainty-equivalence-based methods which typically guarantee stability only in the Lyapunov sense and lack explicit uniform high-probability bounds on the state trajectory expressed in system-theoretic terms. Our analysis explicitly characterizes the trade-off between state amplification and regret, and shows that partially relaxing the sequential-stability requirement yields optimal regret. Finally, our method eliminates the need for any a priori bound on the norm of the DARE solution, an assumption required by all existing computationally efficient optimism in the face of uncertainty (OFU) based algorithms, and thereby removes the reliance of regret guarantees on such external inputs.

3.3SYOct 9, 2018

Detection and Mitigation of Biasing Attacks on Distributed Estimation Networks

Mohammad Deghat, Valery Ugrinovskii, Iman Shames et al.

The paper considers a problem of detecting and mitigating biasing attacks on networks of state observers targeting cooperative state estimation algorithms. The problem is cast within the recently developed framework of distributed estimation utilizing the vector dissipativity approach. The paper shows that a network of distributed observers can be endowed with an additional attack detection layer capable of detecting biasing attacks and correcting their effect on estimates produced by the network. An example is provided to illustrate the performance of the proposed distributed attack detector.

1.0MLFeb 19, 2018

On Estimating Multi-Attribute Choice Preferences using Private Signals and Matrix Factorization

Venkata Sriram Siddhardh Nadendla, Cedric Langbort

Revealed preference theory studies the possibility of modeling an agent's revealed preferences and the construction of a consistent utility function. However, modeling agent's choices over preference orderings is not always practical and demands strong assumptions on human rationality and data-acquisition abilities. Therefore, we propose a simple generative choice model where agents are assumed to generate the choice probabilities based on latent factor matrices that capture their choice evaluation across multiple attributes. Since the multi-attribute evaluation is typically hidden within the agent's psyche, we consider a signaling mechanism where agents are provided with choice information through private signals, so that the agent's choices provide more insight about his/her latent evaluation across multiple attributes. We estimate the choice model via a novel multi-stage matrix factorization algorithm that minimizes the average deviation of the factor estimates from choice data. Simulation results are presented to validate the estimation performance of our proposed algorithm.

1.2SYJun 5, 2017

Controller-jammer game models of Denial of Service in control systems operating over packet-dropping links

V. Ugrinovskii, C. Langbort

The paper introduces a class of zero-sum games between the adversary and controller as a scenario for a `denial of service' in a networked control system. The communication link is modeled as a set of transmission regimes controlled by a strategic jammer whose intention is to wage an attack on the plant by choosing a most damaging regime-switching strategy. We demonstrate that even in the one-step case, the introduced games admit a saddle-point equilibrium, at which the jammer's optimal policy is to randomize in a region of the plant's state space, thus requiring the controller to undertake a nontrivial response which is different from what one would expect in a standard stochastic control problem over a packet dropping link. The paper derives conditions for the introduced games to have such a saddle-point equilibrium. Furthermore, we show that in more general multi-stage games, these conditions provide `greedy' jamming strategies for the adversary.

2.3GTJan 27, 2017

Optimal Communication Strategies in Networked Cyber-Physical Systems with Adversarial Elements

Emrah Akyol, Kenneth Rose, Tamer Basar et al.

This paper studies optimal communication and coordination strategies in cyber-physical systems for both defender and attacker within a game-theoretic framework. We model the communication network of a cyber-physical system as a sensor network which involves one single Gaussian source observed by many sensors, subject to additive independent Gaussian observation noises. The sensors communicate with the estimator over a coherent Gaussian multiple access channel. The aim of the receiver is to reconstruct the underlying source with minimum mean squared error. The scenario of interest here is one where some of the sensors are captured by the attacker and they act as the adversary (jammer): they strive to maximize distortion. The receiver (estimator) knows the captured sensors but still cannot simply ignore them due to the multiple access channel, i.e., the outputs of all sensors are summed to generate the estimator input. We show that the ability of transmitter sensors to secretly agree on a random event, that is "coordination", plays a key role in the analysis...

3.1CRJul 12, 2016

Scalar Quadratic-Gaussian Soft Watermarking Games

Kivanc Mihcak, Emrah Akyol, Tamer Basar et al.

We introduce the zero-sum game problem of soft watermarking: The hidden information (watermark) comes from a continuum and has a perceptual value; the receiver generates an estimate of the embedded watermark to minimize the expected estimation error (unlike the conventional watermarking schemes where both the hidden information and the receiver output are from a discrete finite set). Applications include embedding a multimedia content into another. We consider in this paper the scalar Gaussian case and use expected mean-squared distortion. We formulate the resulting problem as a zero-sum game between the encoder & receiver pair and the attacker. We show that for the lin- ear encoder, the optimal attacker is Gaussian-affine, derive the optimal system parameters in that case, and discuss the corresponding system behavior. We also provide numerical results to gain further insight and understanding of the system behavior at optimality.

1.2SYSep 17, 2016

Detection of Biasing Attacks on Distributed Estimation Networks

Mohammad Deghat, Valery Ugrinovskii, Iman Shames et al.

The paper addresses the problem of detecting attacks on distributed estimator networks that aim to intentionally bias process estimates produced by the network. It provides a sufficient condition, in terms of the feasibility of certain linear matrix inequalities, which guarantees distributed input attack detection using an $H_\infty$ approach.

1.2GTJun 25, 2015

Estimation with Strategic Sensors

Farhad Farokhi, Andre M. H. Teixeira, Cedric Langbort

We introduce a model of estimation in the presence of strategic, self-interested sensors. We employ a game-theoretic setup to model the interaction between the sensors and the receiver. The cost function of the receiver is equal to the estimation error variance while the cost function of the sensor contains an extra term which is determined by its private information. We start by the single sensor case in which the receiver has access to a noisy but honest side information in addition to the message transmitted by a strategic sensor. We study both static and dynamic estimation problems. For both these problems, we characterize a family of equilibria in which the sensor and the receiver employ simple strategies. Interestingly, for the dynamic estimation problem, we find an equilibrium for which the strategic sensor uses a memory-less policy. We generalize the static estimation setup to multiple sensors with synchronous communication structure (i.e., all the sensors transmit their messages simultaneously). We prove the maybe surprising fact that, for the constructed equilibrium in affine strategies, the estimation quality degrades as the number of sensors increases. However, if the sensors are herding (i.e., copying each other policies), the quality of the receiver's estimation improves as the number of sensors increases. Finally, we consider the asynchronous communication structure (i.e., the sensors transmit their messages sequentially).