David Fridovich-Keil

LG
h-index52
59papers
897citations
Novelty54%
AI Score57

59 Papers

SYMar 6, 2018
Planning, Fast and Slow: A Framework for Adaptive Real-Time Safe Trajectory Planning

David Fridovich-Keil, Sylvia L. Herbert, Jaime F. Fisac et al.

Motion planning is an extremely well-studied problem in the robotics community, yet existing work largely falls into one of two categories: computationally efficient but with few if any safety guarantees, or able to give stronger guarantees but at high computational cost. This work builds on a recent development called FaSTrack in which a slow offline computation provides a modular safety guarantee for a faster online planner. We introduce the notion of "meta-planning" in which a refined offline computation enables safe switching between different online planners. This provides autonomous systems with the ability to adapt motion plans to a priori unknown environments in real-time as sensor measurements detect new obstacles, and the flexibility to maneuver differently in the presence of obstacles than they would in free space, all while maintaining a strict safety guarantee. We demonstrate the meta-planning algorithm both in simulation and in hardware using a small Crazyflie 2.0 quadrotor.

GTJun 12, 2024
Coordination in Noncooperative Multiplayer Matrix Games via Reduced Rank Correlated Equilibria

Jaehan Im, Yue Yu, David Fridovich-Keil et al.

Coordination in multiplayer games enables players to avoid the lose-lose outcome that often arises at Nash equilibria. However, designing a coordination mechanism typically requires the consideration of the joint actions of all players, which becomes intractable in large-scale games. We develop a novel coordination mechanism, termed reduced rank correlated equilibria, which reduces the number of joint actions to be considered and thereby mitigates computational complexity. The idea is to approximate the set of all joint actions with the actions used in a set of pre-computed Nash equilibria via a convex hull operation. In a game with n players and each player having m actions, the proposed mechanism reduces the number of joint actions considered from O(m^n) to O(mn). We demonstrate the application of the proposed mechanism to an air traffic queue management problem. Compared with the correlated equilibrium-a popular benchmark coordination mechanism-the proposed approach is capable of solving a problem involving four thousand times more joint actions while yielding similar or better performance in terms of a fairness indicator and showing a maximum optimality gap of 0.066% in terms of the average delay cost. In the meantime, it yields a solution that shows up to 99.5% improvement in a fairness indicator and up to 50.4% reduction in average delay cost compared to the Nash solution, which does not involve coordination.

GTAug 15, 2023
Active Inverse Learning in Stackelberg Trajectory Games

William Ward, Yue Yu, Jacob Levy et al.

Game-theoretic inverse learning is the problem of inferring a player's objectives from their actions. We formulate an inverse learning problem in a Stackelberg game between a leader and a follower, where each player's action is the trajectory of a dynamical system. We propose an active inverse learning method for the leader to infer which hypothesis among a finite set of candidates best describes the follower's objective function. Instead of using passively observed trajectories like existing methods, we actively maximize the differences in the follower's trajectories under different hypotheses by optimizing the leader's control inputs. Compared with uniformly random inputs, the optimized inputs accelerate the convergence of the estimated probability of different hypotheses conditioned on the follower's trajectory. We demonstrate the proposed method in a receding-horizon repeated trajectory game and simulate the results using virtual TurtleBots in Gazebo.

ROSep 14, 2023
Connected Autonomous Vehicle Motion Planning with Video Predictions from Smart, Self-Supervised Infrastructure

Jiankai Sun, Shreyas Kousik, David Fridovich-Keil et al. · gatech

Connected autonomous vehicles (CAVs) promise to enhance safety, efficiency, and sustainability in urban transportation. However, this is contingent upon a CAV correctly predicting the motion of surrounding agents and planning its own motion safely. Doing so is challenging in complex urban environments due to frequent occlusions and interactions among many agents. One solution is to leverage smart infrastructure to augment a CAV's situational awareness; the present work leverages a recently proposed "Self-Supervised Traffic Advisor" (SSTA) framework of smart sensors that teach themselves to generate and broadcast useful video predictions of road users. In this work, SSTA predictions are modified to predict future occupancy instead of raw video, which reduces the data footprint of broadcast predictions. The resulting predictions are used within a planning framework, demonstrating that this design can effectively aid CAV motion planning. A variety of numerical experiments study the key factors that make SSTA outputs useful for practical CAV planning in crowded urban environments.

GTMar 31, 2023
Soft-Bellman Equilibrium in Affine Markov Games: Forward Solutions and Inverse Learning

Shenghui Chen, Yue Yu, David Fridovich-Keil et al.

Markov games model interactions among multiple players in a stochastic, dynamic environment. Each player in a Markov game maximizes its expected total discounted reward, which depends upon the policies of the other players. We formulate a class of Markov games, termed affine Markov games, where an affine reward function couples the players' actions. We introduce a novel solution concept, the soft-Bellman equilibrium, where each player is boundedly rational and chooses a soft-Bellman policy rather than a purely rational policy as in the well-known Nash equilibrium concept. We provide conditions for the existence and uniqueness of the soft-Bellman equilibrium and propose a nonlinear least-squares algorithm to compute such an equilibrium in the forward problem. We then solve the inverse game problem of inferring the players' reward parameters from observed state-action trajectories via a projected-gradient algorithm. Experiments in a predator-prey OpenAI Gym environment show that the reward parameters inferred by the proposed algorithm outperform those inferred by a baseline algorithm: they reduce the Kullback-Leibler divergence between the equilibrium policies and observed policies by at least two orders of magnitude.

GTMar 14
Chance-Constrained Correlated Equilibria for Robust Noncooperative Coordination

Jaehan Im, Ufuk Topcu, David Fridovich-Keil

Correlated equilibria enable a coordinator to influence the self-interested agents by recommending actions that no player has an incentive to deviate from. However, the effectiveness of this mechanism relies on accurate knowledge of the agents' cost structures. When cost parameters are uncertain, the recommended actions may no longer be incentive compatible, allowing agents to benefit from deviating from them. We study a chance-constrained correlated equilibrium problem formulation that accounts for uncertainty in agents' costs and guarantees incentive compatibility with a prescribed confidence level. We derive sensitivity results that quantify how uncertainty in individual incentive constraints affects the expected coordination outcome. In particular, the analysis characterizes the value of information by relating the marginal benefit of reducing uncertainty to the dual sensitivities of the incentive constraints, providing guidance on which sources of uncertainty should be prioritized for information acquisition. The results further reveal that increasing the confidence level is not always beneficial and can introduce a tradeoff between robustness and system efficiency. Numerical experiments demonstrate that the proposed framework maintains coordination performance in uncertain environments and are consistent with the theoretical insights developed in the analysis.

GTSep 30, 2023
When Should a Leader Act Suboptimally? The Role of Inferability in Repeated Stackelberg Games

Mustafa O. Karabag, Sophia Smith, Negar Mehr et al.

When interacting with other decision-making agents in non-adversarial scenarios, it is critical for an autonomous agent to have inferable behavior: The agent's actions must convey their intention and strategy. We model the inferability problem using Stackelberg games with observations where a leader and a follower repeatedly interact. During the interactions, the leader uses a fixed mixed strategy. The follower does not know the leader's strategy and dynamically reacts to the statistically inferred strategy based on the leader's previous actions. In the inference setting, the leader may have a lower performance compared to the setting where the follower has full information on the leader's strategy. We refer to the performance gap between these settings as the inferability gap. For a variety of game settings, we show that the inferability gap is upper-bounded by a function of the number of interactions and the stochasticity level of the leader's strategy, encouraging the use of inferable strategies with lower stochasticity levels. We also analyze bimatrix Stackelberg games and identify a set of games where the leader's near-optimal strategy may potentially suffer from a large inferability gap.

GTApr 1
Scalable Coordination with Chance-Constrained Correlated Equilibria via Reduced-Rank Structure

Jaehan Im, David Fridovich-Keil, Ufuk Topcu

Correlated equilibria provide a mechanism for coordinating noncooperative agents through incentive-compatible recommendations, but their guarantees degrade under uncertainty in agents' cost structures. Chance-constrained correlated equilibrium addresses this issue by enforcing incentive compatibility with probabilistic guarantees, but computing such equilibria remains intractable in large-scale coordination problems due to the exponential growth of the joint action space. We develop an approximation method for computing chance-constrained correlated equilibria by showing that these equilibria admit a representation as convex combinations of a finite set of chance-constrained pure Nash equilibria, enabling tractable computation without solving the full correlated equilibrium program. Numerical experiments on large-scale multi-airline coordination scenarios demonstrate substantial reductions in computation time while achieving lower system delay costs compared to current operational practice. Under cost uncertainty, the proposed method consistently achieves lower deviation rate compared to the full formulation while achieving comparable coordination performance.

GTMar 19
Interleaved Information Structures in Dynamic Games: A General Framework with Application to the Linear-Quadratic Case

Janani S K, Kushagra Gupta, Ufuk Topcu et al.

A fundamental problem in noncooperative dynamic game theory is the computation of Nash equilibria under different information structures, which specify the information available to each agent during decision-making. Prior work has extensively studied equilibrium solutions for two canonical information structures: feedback, where agents observe the current state at each time, and open-loop, where agents only observe the initial state. However, these paradigms are often too restrictive to capture realistic settings exhibiting interleaved information structures, in which each agent observes only a subset of other agents at every timestep. To date, there is no systematic framework for modeling and solving dynamic games under arbitrary interleaved information structures. To this end, we make two main contributions. First, we introduce a method to model deterministic dynamic games with arbitrary interleaved information structures as Mathematical Program Networks (MPNs), where the network structure encodes the informational dependencies between agents. Second, for linear-quadratic (LQ) dynamic games, we leverage the MPN formulation to develop a systematic procedure for deriving Riccati-like equations that characterize Nash equilibria. Finally, we illustrate our approach through an example involving three agents exhibiting a cyclic information structure.

SYMar 17
Linear-Quadratic Gaussian Games with Distributed Sparse Estimation

Tianyu Qiu, Filippos Fotiadis, Xinjie Liu et al.

Linear-quadratic Gaussian games provide a framework for modeling strategic interactions in multi-agent systems, where agents must estimate system states from noisy observations while also making decisions to optimize a quadratic cost. However, these formulations usually require agents to utilize the full set of available observations when forming their state estimates, which can be unrealistic in large-scale or resource-constrained settings. In this paper, we consider linear-quadratic Gaussian games with sparse interagent observations. To enforce sparsity in the estimation stage, we design a distributed estimator that balances estimation effectiveness with interagent measurement sparsity via a group lasso problem, while agents implement feedback Nash strategies based on their state estimates. We provide sufficient conditions under which the sparse estimator is guaranteed to trigger a corrective reset to the optimal estimation gain, ensuring that estimation quality does not degrade beyond a level determined by the regularization parameters. Simulations on a formation game show that the proposed approach yields a significant reduction in communication resources consumed while only minimally affecting the nominal equilibrium trajectories.

LGJan 2
Bayesian Inverse Games with High-Dimensional Multi-Modal Observations

Yash Jain, Xinjie Liu, Lasse Peters et al.

Many multi-agent interaction scenarios can be naturally modeled as noncooperative games, where each agent's decisions depend on others' future actions. However, deploying game-theoretic planners for autonomous decision-making requires a specification of all agents' objectives. To circumvent this practical difficulty, recent work develops maximum likelihood techniques for solving inverse games that can identify unknown agent objectives from interaction data. Unfortunately, these methods only infer point estimates and do not quantify estimator uncertainty; correspondingly, downstream planning decisions can overconfidently commit to unsafe actions. We present an approximate Bayesian inference approach for solving the inverse game problem, which can incorporate observation data from multiple modalities and be used to generate samples from the Bayesian posterior over the hidden agent objectives given limited sensor observations in real time. Concretely, the proposed Bayesian inverse game framework trains a structured variational autoencoder with an embedded differentiable Nash game solver on interaction datasets and does not require labels of agents' true objectives. Extensive experiments show that our framework successfully learns prior and posterior distributions, improves inference quality over maximum likelihood estimation-based inverse game approaches, and enables safer downstream decision-making without sacrificing efficiency. When trajectory information is uninformative or unavailable, multimodal inference further reduces uncertainty by exploiting additional observation modalities.

MAJul 13, 2024
Inferring Occluded Agent Behavior in Dynamic Games from Noise Corrupted Observations

Tianyu Qiu, David Fridovich-Keil

In mobile robotics and autonomous driving, it is natural to model agent interactions as the Nash equilibrium of a noncooperative, dynamic game. These methods inherently rely on observations from sensors such as lidars and cameras to identify agents participating in the game and, therefore, have difficulty when some agents are occluded. To address this limitation, this paper presents an occlusion-aware game-theoretic inference method to estimate the locations of potentially occluded agents, and simultaneously infer the intentions of both visible and occluded agents, which best accounts for the observations of visible agents. Additionally, we propose a receding horizon planning strategy based on an occlusion-aware contingency game designed to navigate in scenarios with potentially occluded agents. Monte Carlo simulations validate our approach, demonstrating that it accurately estimates the game model and trajectories for both visible and occluded agents using noisy observations of visible agents. Our planning pipeline significantly enhances navigation safety when compared to occlusion-ignorant baseline as well.

LGSep 20, 2023
Symbolic Regression on Sparse and Noisy Data with Gaussian Processes

Junette Hsin, Shubhankar Agarwal, Adam Thorpe et al.

In this paper, we address the challenge of deriving dynamical models from sparse and noisy data. High-quality data is crucial for symbolic regression algorithms; limited and noisy data can present modeling challenges. To overcome this, we combine Gaussian process regression with a sparse identification of nonlinear dynamics (SINDy) method to denoise the data and identify nonlinear dynamical equations. Our approach GPSINDy offers improved robustness with sparse, noisy data compared to SINDy alone. We demonstrate its effectiveness on simulation data from Lotka-Volterra and unicycle models and hardware data from an NVIDIA JetRacer system. We show superior performance over baselines including more than 50% improvement over SINDy and other baselines in predicting future trajectories from noise-corrupted and sparse 5 Hz data.

ROSep 22, 2022
Robust Forecasting for Robotic Control: A Game-Theoretic Approach

Shubhankar Agarwal, David Fridovich-Keil, Sandeep P. Chinchali

Modern robots require accurate forecasts to make optimal decisions in the real world. For example, self-driving cars need an accurate forecast of other agents' future actions to plan safe trajectories. Current methods rely heavily on historical time series to accurately predict the future. However, relying entirely on the observed history is problematic since it could be corrupted by noise, have outliers, or not completely represent all possible outcomes. To solve this problem, we propose a novel framework for generating robust forecasts for robotic control. In order to model real-world factors affecting future forecasts, we introduce the notion of an adversary, which perturbs observed historical time series to increase a robot's ultimate control cost. Specifically, we model this interaction as a zero-sum two-player game between a robot's forecaster and this hypothetical adversary. We show that our proposed game may be solved to a local Nash equilibrium using gradient-based optimization techniques. Furthermore, we show that a forecaster trained with our method performs 30.14% better on out-of-distribution real-world lane change data than baselines.

LGNov 28, 2023
An Investigation of Time Reversal Symmetry in Reinforcement Learning

Brett Barkley, Amy Zhang, David Fridovich-Keil

One of the fundamental challenges associated with reinforcement learning (RL) is that collecting sufficient data can be both time-consuming and expensive. In this paper, we formalize a concept of time reversal symmetry in a Markov decision process (MDP), which builds upon the established structure of dynamically reversible Markov chains (DRMCs) and time-reversibility in classical physics. Specifically, we investigate the utility of this concept in reducing the sample complexity of reinforcement learning. We observe that utilizing the structure of time reversal in an MDP allows every environment transition experienced by an agent to be transformed into a feasible reverse-time transition, effectively doubling the number of experiences in the environment. To test the usefulness of this newly synthesized data, we develop a novel approach called time symmetric data augmentation (TSDA) and investigate its application in both proprioceptive and pixel-based state within the realm of off-policy, model-free RL. Empirical evaluations showcase how these synthetic transitions can enhance the sample efficiency of RL agents in time reversible scenarios without friction or contact. We also test this method in more realistic environments where these assumptions are not globally satisfied. We find that TSDA can significantly degrade sample efficiency and policy performance, but can also improve sample efficiency under the right conditions. Ultimately we conclude that time symmetry shows promise in enhancing the sample efficiency of reinforcement learning and provide guidance when the environment and reward structures are of an appropriate form for TSDA to be employed effectively.

LGApr 3
Neural Operators for Multi-Task Control and Adaptation

David Sewell, Xingjian Li, Stepan Tretiakov et al.

Neural operator methods have emerged as powerful tools for learning mappings between infinite-dimensional function spaces, yet their potential in optimal control remains largely unexplored. We focus on multi-task control problems, whose solution is a mapping from task description (e.g., cost or dynamics functions) to optimal control law (e.g., feedback policy). We approximate these solution operators using a permutation-invariant neural operator architecture. Across a range of parametric optimal control environments and a locomotion benchmark, a single operator trained via behavioral cloning accurately approximates the solution operator and generalizes to unseen tasks, out-of-distribution settings, and varying amounts of task observations. We further show that the branch-trunk structure of our neural operator architecture enables efficient and flexible adaptation to new tasks. We develop structured adaptation strategies ranging from lightweight updates to full-network fine-tuning, achieving strong performance across different data and compute settings. Finally, we introduce meta-trained operator variants that optimize the initialization for few-shot adaptation. These methods enable rapid task adaptation with limited data and consistently outperform a popular meta-learning baseline. Together, our results demonstrate that neural operators provide a unified and efficient framework for multi-task control and adaptation.

ROMar 16
Simulation Distillation: Pretraining World Models in Simulation for Rapid Real-World Adaptation

Jacob Levy, Tyler Westenbroek, Kevin Huang et al.

Simulation-to-real transfer remains a central challenge in robotics, as mismatches between simulated and real-world dynamics often lead to failures. While reinforcement learning offers a principled mechanism for adaptation, existing sim-to-real finetuning methods struggle with exploration and long-horizon credit assignment in the low-data regimes typical of real-world robotics. We introduce Simulation Distillation (SimDist), a sim-to-real framework that distills structural priors from a simulator into a latent world model and enables rapid real-world adaptation via online planning and supervised dynamics finetuning. By transferring reward and value models directly from simulation, SimDist provides dense planning signals from raw perception without requiring value learning during deployment. As a result, real-world adaptation reduces to short-horizon system identification, avoiding long-horizon credit assignment and enabling fast, stable improvement. Across precise manipulation and quadruped locomotion tasks, SimDist substantially outperforms prior methods in data efficiency, stability, and final performance. Project website and code: https://sim-dist.github.io/

SYMay 20
Secure Coordination for Vertiport Sequencing in Advanced Air Mobility

Jaehan Im, Filippos Fotiadis, Ufuk Topcu et al.

Advanced air mobility operations will require reliable coordination mechanisms for managing dense traffic near vertiports. However, sequencing decisions may become vulnerable when they rely on potentially falsified self-reported information such as estimated time of arrival. Self-interested vehicles may misreport their arrival times to obtain favorable landing priority, while malicious actors may spoof information to disrupt sequencing decisions or induce unnecessary congestion. This paper studies secure coordination for vertiport sequencing under sensing uncertainty. We consider a coordinator that combines self-reported Remote-ID information with externally obtained surveillance measurements to check reports and assign separation-feasible arrival schedules. Since surveillance-based estimates are uncertain, falsified reports may remain consistent with the sensing uncertainty region and cannot always be rejected outright. We therefore formulate sequencing as a robust design problem over this uncertainty region. Self-interested misreporting is modeled as a strategic deviation that improves the reporting vehicle's own sequencing outcome, whereas malicious spoofing is modeled as an adversarial disturbance that degrades the system-level objective. The final paper will develop robust sequencing rules over surveillance-consistent uncertainty sets and evaluate their performance in representative vertiport sequencing scenarios.

LGJul 16, 2023
Enabling Efficient, Reliable Real-World Reinforcement Learning with Approximate Physics-Based Models

Tyler Westenbroek, Jacob Levy, David Fridovich-Keil

We focus on developing efficient and reliable policy optimization strategies for robot learning with real-world data. In recent years, policy gradient methods have emerged as a promising paradigm for training control policies in simulation. However, these approaches often remain too data inefficient or unreliable to train on real robotic hardware. In this paper we introduce a novel policy gradient-based policy optimization framework which systematically leverages a (possibly highly simplified) first-principles model and enables learning precise control policies with limited amounts of real-world data. Our approach $1)$ uses the derivatives of the model to produce sample-efficient estimates of the policy gradient and $2)$ uses the model to design a low-level tracking controller, which is embedded in the policy class. Theoretical analysis provides insight into how the presence of this feedback controller overcomes key limitations of stand-alone policy gradient methods, while hardware experiments with a small car and quadruped demonstrate that our approach can learn precise control strategies reliably and with only minutes of real-world data.

LGMay 7
Scaling Pretrained Representations Enables Label-Free Out-of-Distribution Detection Without Fine-Tuning

Brett Barkley, Preston Culbertson, David Fridovich-Keil

Models trained with deep learning often fail to signal when inputs fall outside their training data manifold, leading to unreliable predictions under distribution shift. Prior work suggests that effective out-of-distribution (OOD) detection often requires class-conditional modeling or specialized models obtained through supervised fine-tuning. We revisit this assumption in modern pretrained models and show that their frozen representations already encode sufficient geometric structure for accurate label-free OOD detection. Across 59 backbone-task pairings spanning vision and language, we compare two complementary label-free detectors: a global Mahalanobis estimator fit on unlabeled latent representations, and ReSCOPED, a lightweight, diffusion-based typicality estimator operating on the same features at a local level. Despite their different detection mechanisms, representation scaling reveals a consistent regime-dependent pattern: both local and global detectors' absolute performance improves with better representation quality, and performance gaps between the two detectors disappear across both language and vision tasks as representations scale. These results suggest that label-free OOD detection depends strongly on the geometry exposed by frozen pretrained backbones, reducing the importance of detector choice as backbone scale increases and enabling efficient deployment directly on frozen models.

ROApr 1
A Player Selection Network for Scalable Game-Theoretic Prediction and Planning

Tianyu Qiu, Eric Ouano, Fernando Palafox et al.

While game-theoretic planning frameworks are effective at modeling multi-agent interactions, they require solving large optimization problems where the number of variables increases with the number of agents, resulting in long computation times that limit their use in large-scale, real-time systems. To address this issue, we propose 1) PSN Game-a learning-based, game-theoretic prediction and planning framework that reduces game size by learning a Player Selection Network (PSN); and 2) a Goal Inference Network (GIN) that makes it possible to use the PSN in incomplete-information games where other agents' intentions are unknown to the ego agent. A PSN outputs a player selection mask that distinguishes influential players from less relevant ones, enabling the ego player to solve a smaller, masked game involving only selected players. By reducing the number of players included in the game, PSN shrinks the corresponding optimization problems, leading to faster solve times. Experiments in both simulated scenarios and real-world pedestrian trajectory datasets show that PSN is competitive with, and often improves upon, the evaluated explicit game-theoretic selection baselines in 1) prediction accuracy and 2) planning safety. Across scenarios, PSN typically selects substantially fewer players than are present in the full game, thereby reducing game size and planning complexity. PSN also generalizes to settings in which agents' objectives are unknown, via the GIN, without test-time fine-tuning. By selecting only the most relevant players for decision-making, PSN Game provides a practical mechanism for reducing planning complexity that can be integrated into existing multi-agent planning frameworks.

GTMar 27
Breaking Exponential Complexity in Games of Ordered Preference: A Tractable Reformulation

Dong Ho Lee, Jingqi Li, Lasse Peters et al.

Games of ordered preference (GOOPs) model multi-player equilibrium problems in which each player maintains a distinct hierarchy of strictly prioritized objectives. Existing approaches solve GOOPs by deriving and enforcing the necessary optimality conditions that characterize lexicographically constrained Nash equilibria through a single-level reformulation. However, the number of primal and dual variables in the resulting KKT system grows exponentially with the number of preference levels, leading to severe scalability challenges. We derive a compact reformulation of these necessary conditions that preserves the essential primal stationarity structure across hierarchy levels, yielding a "reduced" KKT system whose size grows polynomially with both the number of players and the number of preference levels. The reduced system constitutes a relaxation of the complete KKT system, yet it remains a valid necessary condition for local GOOP equilibria. For GOOPs with quadratic objectives and linear constraints, we prove that the primal solution sets of the reduced and complete KKT systems coincide. More generally, for GOOPs with arbitrary (but smooth) nonlinear objectives and constraints, the reduced KKT conditions recover all local GOOP equilibria but may admit spurious non-equilibrium solutions. We introduce a second-order sufficient condition to certify when a candidate point corresponds to a local GOOP equilibrium. We also develop a primal-dual interior-point method for computing a local GOOP equilibrium with local quadratic convergence. The resulting framework enables scalable and efficient computation of GOOP equilibria beyond the tractable range of existing exponentially complex formulations.

GTMay 14
Efficiently Solving Mixed-Hierarchy Games with Quasi-Policy Approximations

Hamzah Khan, Dong Ho Lee, Jingqi Li et al.

Multi-robot coordination often exhibits hierarchical structure, with some robots' decisions depending on the planned behaviors of others. While game theory provides a principled framework for such interactions, existing solvers struggle to handle mixed information structures that combine simultaneous (Nash) and hierarchical (Stackelberg) decision-making. We study N-robot forest-structured mixed-hierarchy games, in which each robot acts as a Stackelberg leader over its subtree while robots in different branches interact via Nash equilibria. We derive the Karush-Kuhn-Tucker (KKT) first-order optimality conditions for this class of games and show that they involve increasingly high-order derivatives of robots' best-response policies as the hierarchy depth grows, rendering a direct solution intractable. To overcome this challenge, we introduce a quasi-policy approximation that removes higher-order policy derivatives and develop an inexact Newton method for efficiently solving the resulting approximated KKT systems. We prove local exponential convergence of the proposed algorithm for games with non-quadratic objectives and nonlinear constraints. The approach is implemented in a highly optimized Julia library (MixedHierarchyGames.jl) and evaluated in hardware and simulated multi-agent experiments, demonstrating real-time convergence for complex mixed-hierarchy information structures.

LGJan 29
Generalized Information Gathering Under Dynamics Uncertainty

Fernando Palafox, Jingqi Li, Jesse Milzman et al.

An agent operating in an unknown dynamical system must learn its dynamics from observations. Active information gathering accelerates this learning, but existing methods derive bespoke costs for specific modeling choices: dynamics models, belief update procedures, observation models, and planners. We present a unifying framework that decouples these choices from the information-gathering cost by explicitly exposing the causal dependencies between parameters, beliefs, and controls. Using this framework, we derive a general information-gathering cost based on Massey's directed information that assumes only Markov dynamics with additive noise and is otherwise agnostic to modeling choices. We prove that the mutual information cost used in existing literature is a special case of our cost. Then, we leverage our framework to establish an explicit connection between the mutual information cost and information gain in linearized Bayesian estimation, thereby providing theoretical justification for mutual information-based active learning approaches. Finally, we illustrate the practical utility of our framework through experiments spanning linear, nonlinear, and multi-agent systems.

LGMay 11
Controllability in preference-conditioned multi-objective reinforcement learning

Pau de las Heras Molins, Beyazit Yalcinkaya, Lasse Peters et al.

Multi-objective reinforcement learning (MORL) allows a user to express preference over outcomes in terms of the relative importance of the objectives, but standard metrics cannot capture whether changes in preference reliably change the agent's behavior in the intended way, a property termed controllability. As a result, preference-conditioned agents can score well on standard MORL metrics while being insensitive to the preference input. If the ability to control agents cannot be reliably assessed, the symbolic interface that MORL provides between user intent and agent behavior is broken. Mainstream MORL metrics alone fail to measure the controllability of preference-conditioned agents, motivating a complementary metric specifically designed to that end. We hope the results spur discussion in the community on existing evaluation protocols to consolidate advances in preference adaptation in MORL to larger and more complex problems.

LGSep 28, 2025Code
DiBS-MTL: Transformation-Invariant Multitask Learning with Direction Oracles

Surya Murthy, Kushagra Gupta, Mustafa O. Karabag et al.

Multitask learning (MTL) algorithms typically rely on schemes that combine different task losses or their gradients through weighted averaging. These methods aim to find Pareto stationary points by using heuristics that require access to task loss values, gradients, or both. In doing so, a central challenge arises because task losses can be arbitrarily, nonaffinely scaled relative to one another, causing certain tasks to dominate training and degrade overall performance. A recent advance in cooperative bargaining theory, the Direction-based Bargaining Solution (DiBS), yields Pareto stationary solutions immune to task domination because of its invariance to monotonic nonaffine task loss transformations. However, the convergence behavior of DiBS in nonconvex MTL settings is currently not understood. To this end, we prove that under standard assumptions, a subsequence of DiBS iterates converges to a Pareto stationary point when task losses are possibly nonconvex, and propose DiBS-MTL, a computationally efficient adaptation of DiBS to the MTL setting. Finally, we validate DiBS-MTL empirically on standard MTL benchmarks, showing that it achieves competitive performance with state-of-the-art methods while maintaining robustness to nonaffine monotonic transformations that significantly degrade the performance of existing approaches, including prior bargaining-inspired MTL methods. Code available at https://github.com/suryakmurthy/dibs-mtl.

LGMay 7
A Flow Matching Algorithm for Many-Shot Adaptation to Unseen Distributions

Tyler Ingebrand, Ruihan Zhao, Kushagra Gupta et al.

While generative modeling has achieved remarkable success on tasks like natural language-conditioned image generation, enabling model adaptation from example data points remains a relatively underexplored and challenging problem. To this end, we propose Function Projection for Flow Matching (FP-FM), an algorithm that directly conditions generation on samples from the target distribution. FP-FM learns basis functions to span the velocity fields corresponding to a set of training distributions, and adapts to new distributions by computing a simple least-squares projection onto this basis. This enables efficient generation of samples from diverse target distributions without additional training at inference time. We further introduce multiple variants of FP-FM that provide a trade-off in expressivity and compute by enriching the coefficient calculation, e.g., by making the coefficients dependent on time. FP-FM achieves greatly improved precision and recall relative to baselines across synthetic and image-based datasets, with especially strong gains on unseen distributions.

ROOct 11, 2024
Learning to Walk from Three Minutes of Real-World Data with Semi-structured Dynamics Models

Jacob Levy, Tyler Westenbroek, David Fridovich-Keil

Traditionally, model-based reinforcement learning (MBRL) methods exploit neural networks as flexible function approximators to represent $\textit{a priori}$ unknown environment dynamics. However, training data are typically scarce in practice, and these black-box models often fail to generalize. Modeling architectures that leverage known physics can substantially reduce the complexity of system-identification, but break down in the face of complex phenomena such as contact. We introduce a novel framework for learning semi-structured dynamics models for contact-rich systems which seamlessly integrates structured first principles modeling techniques with black-box auto-regressive models. Specifically, we develop an ensemble of probabilistic models to estimate external forces, conditioned on historical observations and actions, and integrate these predictions using known Lagrangian dynamics. With this semi-structured approach, we can make accurate long-horizon predictions with substantially less data than prior methods. We leverage this capability and propose Semi-Structured Reinforcement Learning ($\texttt{SSRL}$) a simple model-based learning framework which pushes the sample complexity boundary for real-world learning. We validate our approach on a real-world Unitree Go1 quadruped robot, learning dynamic gaits -- from scratch -- on both hard and soft surfaces with just a few minutes of real-world data. Video and code are available at: https://sites.google.com/utexas.edu/ssrl

ROFeb 14, 2024
Auto-Encoding Bayesian Inverse Games

Xinjie Liu, Lasse Peters, Javier Alonso-Mora et al.

When multiple agents interact in a common environment, each agent's actions impact others' future decisions, and noncooperative dynamic games naturally capture this coupling. In interactive motion planning, however, agents typically do not have access to a complete model of the game, e.g., due to unknown objectives of other players. Therefore, we consider the inverse game problem, in which some properties of the game are unknown a priori and must be inferred from observations. Existing maximum likelihood estimation (MLE) approaches to solve inverse games provide only point estimates of unknown parameters without quantifying uncertainty, and perform poorly when many parameter values explain the observed behavior. To address these limitations, we take a Bayesian perspective and construct posterior distributions of game parameters. To render inference tractable, we employ a variational autoencoder (VAE) with an embedded differentiable game solver. This structured VAE can be trained from an unlabeled dataset of observed interactions, naturally handles continuous, multi-modal distributions, and supports efficient sampling from the inferred posteriors without computing game solutions at runtime. Extensive evaluations in simulated driving scenarios demonstrate that the proposed approach successfully learns the prior and posterior game parameter distributions, provides more accurate objective estimates than MLE baselines, and facilitates safer and more efficient game-theoretic motion planning.

LGDec 18, 2024
Stealing That Free Lunch: Exposing the Limits of Dyna-Style Reinforcement Learning

Brett Barkley, David Fridovich-Keil

Dyna-style off-policy model-based reinforcement learning (DMBRL) algorithms are a family of techniques for generating synthetic state transition data and thereby enhancing the sample efficiency of off-policy RL algorithms. This paper identifies and investigates a surprising performance gap observed when applying DMBRL algorithms across different benchmark environments with proprioceptive observations. We show that, while DMBRL algorithms perform well in OpenAI Gym, their performance can drop significantly in DeepMind Control Suite (DMC), even though these settings offer similar tasks and identical physics backends. Modern techniques designed to address several key issues that arise in these settings do not provide a consistent improvement across all environments, and overall our results show that adding synthetic rollouts to the training process -- the backbone of Dyna-style algorithms -- significantly degrades performance across most DMC environments. Our findings contribute to a deeper understanding of several fundamental challenges in model-based RL and show that, like many optimization fields, there is no free lunch when evaluating performance across diverse benchmarks in RL.

SYMar 18, 2024
Decomposing Control Lyapunov Functions for Efficient Reinforcement Learning

Antonio Lopez, David Fridovich-Keil

Recent methods using Reinforcement Learning (RL) have proven to be successful for training intelligent agents in unknown environments. However, RL has not been applied widely in real-world robotics scenarios. This is because current state-of-the-art RL methods require large amounts of data to learn a specific task, leading to unreasonable costs when deploying the agent to collect data in real-world applications. In this paper, we build from existing work that reshapes the reward function in RL by introducing a Control Lyapunov Function (CLF), which is demonstrated to reduce the sample complexity. Still, this formulation requires knowing a CLF of the system, but due to the lack of a general method, it is often a challenge to identify a suitable CLF. Existing work can compute low-dimensional CLFs via a Hamilton-Jacobi reachability procedure. However, this class of methods becomes intractable on high-dimensional systems, a problem that we address by using a system decomposition technique to compute what we call Decomposed Control Lyapunov Functions (DCLFs). We use the computed DCLF for reward shaping, which we show improves RL performance. Through multiple examples, we demonstrate the effectiveness of this approach, where our method finds a policy to successfully land a quadcopter in less than half the amount of real-world data required by the state-of-the-art Soft-Actor Critic algorithm.

ROApr 23, 2025
Meta-Learning Online Dynamics Model Adaptation in Off-Road Autonomous Driving

Jacob Levy, Jason Gibson, Bogdan Vlahov et al.

High-speed off-road autonomous driving presents unique challenges due to complex, evolving terrain characteristics and the difficulty of accurately modeling terrain-vehicle interactions. While dynamics models used in model-based control can be learned from real-world data, they often struggle to generalize to unseen terrain, making real-time adaptation essential. We propose a novel framework that combines a Kalman filter-based online adaptation scheme with meta-learned parameters to address these challenges. Offline meta-learning optimizes the basis functions along which adaptation occurs, as well as the adaptation parameters, while online adaptation dynamically adjusts the onboard dynamics model in real time for model-based control. We validate our approach through extensive experiments, including real-world testing on a full-scale autonomous off-road vehicle, demonstrating that our method outperforms baseline approaches in prediction accuracy, performance, and safety metrics, particularly in safety-critical scenarios. Our results underscore the effectiveness of meta-learned dynamics model adaptation, advancing the development of reliable autonomous systems capable of navigating diverse and unseen environments. Video is available at: https://youtu.be/cCKHHrDRQEA

ROOct 14, 2025
UNCAP: Uncertainty-Guided Planning Using Natural Language Communication for Cooperative Autonomous Vehicles

Neel P. Bhatt, Po-han Li, Kushagra Gupta et al.

Safe large-scale coordination of multiple cooperative connected autonomous vehicles (CAVs) hinges on communication that is both efficient and interpretable. Existing approaches either rely on transmitting high-bandwidth raw sensor data streams or neglect perception and planning uncertainties inherent in shared data, resulting in systems that are neither scalable nor safe. To address these limitations, we propose Uncertainty-Guided Natural Language Cooperative Autonomous Planning (UNCAP), a vision-language model-based planning approach that enables CAVs to communicate via lightweight natural language messages while explicitly accounting for perception uncertainty in decision-making. UNCAP features a two-stage communication protocol: (i) an ego CAV first identifies the subset of vehicles most relevant for information exchange, and (ii) the selected CAVs then transmit messages that quantitatively express their perception uncertainty. By selectively fusing messages that maximize mutual information, this strategy allows the ego vehicle to integrate only the most relevant signals into its decision-making, improving both the scalability and reliability of cooperative planning. Experiments across diverse driving scenarios show a 63% reduction in communication bandwidth with a 31% increase in driving safety score, a 61% reduction in decision uncertainty, and a four-fold increase in collision distance margin during near-miss events. Project website: https://uncap-project.github.io/

CVMay 8, 2025
Real-Time Privacy Preservation for Robot Visual Perception

Minkyu Choi, Yunhao Yang, Neel P. Bhatt et al.

Many robots (e.g., iRobot's Roomba) operate based on visual observations from live video streams, and such observations may inadvertently include privacy-sensitive objects, such as personal identifiers. Existing approaches for preserving privacy rely on deep learning models, differential privacy, or cryptography. They lack guarantees for the complete concealment of all sensitive objects. Guaranteeing concealment requires post-processing techniques and thus is inadequate for real-time video streams. We develop a method for privacy-constrained video streaming, PCVS, that conceals sensitive objects within real-time video streams. PCVS takes a logical specification constraining the existence of privacy-sensitive objects, e.g., never show faces when a person exists. It uses a detection model to evaluate the existence of these objects in each incoming frame. Then, it blurs out a subset of objects such that the existence of the remaining objects satisfies the specification. We then propose a conformal prediction approach to (i) establish a theoretical lower bound on the probability of the existence of these objects in a sequence of frames satisfying the specification and (ii) update the bound with the arrival of each subsequent frame. Quantitative evaluations show that PCVS achieves over 95 percent specification satisfaction rate in multiple datasets, significantly outperforming other methods. The satisfaction rate is consistently above the theoretical bounds across all datasets, indicating that the established bounds hold. Additionally, we deploy PCVS on robots in real-time operation and show that the robots operate normally without being compromised when PCVS conceals objects.

LGMar 23, 2025
A Framework for Finding Local Saddle Points in Two-Player Zero-Sum Black-Box Games

Shubhankar Agarwal, Hamzah I. Khan, Sandeep P. Chinchali et al.

Saddle point optimization is a critical problem employed in numerous real-world applications, including portfolio optimization, generative adversarial networks, and robotics. It has been extensively studied in cases where the objective function is known and differentiable. Existing work in black-box settings with unknown objectives that can only be sampled either assumes convexity-concavity in the objective to simplify the problem or operates with noisy gradient estimators. In contrast, we introduce a framework inspired by Bayesian optimization which utilizes Gaussian processes to model the unknown (potentially nonconvex-nonconcave) objective and requires only zeroth-order samples. Our approach frames the saddle point optimization problem as a two-level process which can flexibly integrate existing and novel approaches to this problem. The upper level of our framework produces a model of the objective function by sampling in promising locations, and the lower level of our framework uses the existing model to frame and solve a general-sum game to identify locations to sample. This lower level procedure can be designed in complementary ways, and we demonstrate the flexibility of our approach by introducing variants which appropriately trade off between factors like runtime, the cost of function evaluations, and the number of available initial samples. We experimentally demonstrate these algorithms on synthetic and realistic datasets in black-box nonconvex-nonconcave settings, showcasing their ability to efficiently locate local saddle points in these contexts.

LGMar 7, 2025
A Multi-Fidelity Control Variate Approach for Policy Gradient Estimation

Xinjie Liu, Cyrus Neary, Kushagra Gupta et al.

Many reinforcement learning (RL) algorithms are impractical for deployment in operational systems or for training with computationally expensive high-fidelity simulations, as they require large amounts of data. Meanwhile, low-fidelity simulators -- such as reduced-order models, heuristic rewards, or generative world models -- can cheaply provide useful data for RL training, even if they are too coarse for zero-shot transfer. We propose multi-fidelity policy gradients (MFPGs), an RL framework that mixes a small amount of data from the target environment with a control variate formed from a large volume of low-fidelity simulation data to construct an unbiased, variance-reduced estimator for on-policy policy gradients. We instantiate the framework with a multi-fidelity variant of the classical REINFORCE algorithm. We show that under standard assumptions, the MFPG estimator guarantees asymptotic convergence of REINFORCE to locally optimal policies in the target environment, and achieves faster finite-sample convergence rates compared to training with high-fidelity data alone. Empirically, we evaluate the MFPG algorithm across a suite of simulated robotics benchmark tasks with limited high-fidelity data but abundant off-dynamics, low-fidelity data. With mild-moderate dynamics gaps, MFPG reliably improves the median performance over a high-fidelity-only baseline, matching the performance of leading multi-fidelity baselines despite its simplicity and minimal tuning overhead. Under large dynamics gaps, MFPG demonstrates the strongest robustness among the evaluated multi-fidelity approaches. An additional experiment shows that MFPG can remain effective even under low-fidelity reward misspecification. Thus, MFPG not only offers a novel paradigm for efficient sim-to-real transfer but also provides a principled approach to managing the trade-off between policy performance and data collection costs.

LGDec 2, 2024
Dense Dynamics-Aware Reward Synthesis: Integrating Prior Experience with Demonstrations

Cevahir Koprulu, Po-han Li, Tianyu Qiu et al.

Many continuous control problems can be formulated as sparse-reward reinforcement learning (RL) tasks. In principle, online RL methods can automatically explore the state space to solve each new task. However, discovering sequences of actions that lead to a non-zero reward becomes exponentially more difficult as the task horizon increases. Manually shaping rewards can accelerate learning for a fixed task, but it is an arduous process that must be repeated for each new environment. We introduce a systematic reward-shaping framework that distills the information contained in 1) a task-agnostic prior data set and 2) a small number of task-specific expert demonstrations, and then uses these priors to synthesize dense dynamics-aware rewards for the given task. This supervision substantially accelerates learning in our experiments, and we provide analysis demonstrating how the approach can effectively guide online learning agents to faraway goals.

LGOct 9, 2025
Deceptive Exploration in Multi-armed Bandits

I. Arda Vurankaya, Mustafa O. Karabag, Wesley A. Suttle et al.

We consider a multi-armed bandit setting in which each arm has a public and a private reward distribution. An observer expects an agent to follow Thompson Sampling according to the public rewards, however, the deceptive agent aims to quickly identify the best private arm without being noticed. The observer can observe the public rewards and the pulled arms, but not the private rewards. The agent, on the other hand, observes both the public and private rewards. We formalize detectability as a stepwise Kullback-Leibler (KL) divergence constraint between the actual pull probabilities used by the agent and the anticipated pull probabilities by the observer. We model successful pulling of public suboptimal arms as a % Bernoulli process where the success probability decreases with each successful pull, and show these pulls can happen at most at a $Θ(\sqrt{T}) $ rate under the KL constraint. We then formulate a maximin problem based on public and private means, whose solution characterizes the optimal error exponent for best private arm identification. We finally propose an algorithm inspired by top-two algorithms. This algorithm naturally adapts its exploration according to the hardness of pulling arms based on the public suboptimality gaps. We provide numerical examples illustrating the $Θ(\sqrt{T}) $ rate and the behavior of the proposed algorithm.

LGOct 1, 2025
SCOPED: Score-Curvature Out-of-distribution Proximity Evaluator for Diffusion

Brett Barkley, Preston Culbertson, David Fridovich-Keil

Out-of-distribution (OOD) detection is essential for reliable deployment of machine learning systems in vision, robotics, reinforcement learning, and beyond. We introduce Score-Curvature Out-of-distribution Proximity Evaluator for Diffusion (SCOPED), a fast and general-purpose OOD detection method for diffusion models that reduces the number of forward passes on the trained model by an order of magnitude compared to prior methods, outperforming most diffusion-based baselines and closely approaching the accuracy of the strongest ones. SCOPED is computed from a single diffusion model trained once on a diverse dataset, and combines the Jacobian trace and squared norm of the model's score function into a single test statistic. Rather than thresholding on a fixed value, we estimate the in-distribution density of SCOPED scores using kernel density estimation, enabling a flexible, unsupervised test that, in the simplest case, only requires a single forward pass and one Jacobian-vector product (JVP), made efficient by Hutchinson's trace estimator. On four vision benchmarks, SCOPED achieves competitive or state-of-the-art precision-recall scores despite its low computational cost. The same method generalizes to robotic control tasks with shared state and action spaces, identifying distribution shifts across reward functions and training regimes. These results position SCOPED as a practical foundation for fast and reliable OOD detection in real-world domains, including perceptual artifacts in vision, outlier detection in autoregressive models, exploration in reinforcement learning, and dataset curation for unsupervised training.

LGOct 1, 2025
Fixing That Free Lunch: When, Where, and Why Synthetic Data Fails in Model-Based Policy Optimization

Brett Barkley, David Fridovich-Keil

Synthetic data is a core component of data-efficient Dyna-style model-based reinforcement learning, yet it can also degrade performance. We study when it helps, where it fails, and why, and we show that addressing the resulting failure modes enables policy improvement that was previously unattainable. We focus on Model-Based Policy Optimization (MBPO), which performs actor and critic updates using synthetic action counterfactuals. Despite reports of strong and generalizable sample-efficiency gains in OpenAI Gym, recent work shows that MBPO often underperforms its model-free counterpart, Soft Actor-Critic (SAC), in the DeepMind Control Suite (DMC). Although both suites involve continuous control with proprioceptive robots, this shift leads to sharp performance losses across seven challenging DMC tasks, with MBPO failing in cases where claims of generalization from Gym would imply success. This reveals how environment-specific assumptions can become implicitly encoded into algorithm design when evaluation is limited. We identify two coupled issues behind these failures: scale mismatches between dynamics and reward models that induce critic underestimation and hinder policy improvement during model-policy coevolution, and a poor choice of target representation that inflates model variance and produces error-prone rollouts. Addressing these failure modes enables policy improvement where none was previously possible, allowing MBPO to outperform SAC in five of seven tasks while preserving the strong performance previously reported in OpenAI Gym. Rather than aiming only for incremental average gains, we hope our findings motivate the community to develop taxonomies that tie MDP task- and environment-level structure to algorithmic failure modes, pursue unified solutions where possible, and clarify how benchmark choices ultimately shape the conditions under which algorithms generalize.

MAFeb 25, 2022
Hierarchical Control for Head-to-Head Autonomous Racing

Rishabh Saumil Thakkar, Aryaman Singh Samyal, David Fridovich-Keil et al.

We develop a hierarchical controller for head-to-head autonomous racing. We first introduce a formulation of a racing game with realistic safety and fairness rules. A high-level planner approximates the original formulation as a discrete game with simplified state, control, and dynamics to easily encode the complex safety and fairness rules and calculates a series of target waypoints. The low-level controller takes the resulting waypoints as a reference trajectory and computes high-resolution control inputs by solving an alternative formulation approximation with simplified objectives and constraints. We consider two approaches for the low-level planner, constructing two hierarchical controllers. One approach uses multi-agent reinforcement learning (MARL), and the other solves a linear-quadratic Nash game (LQNG) to produce control inputs. The controllers are compared against three baselines: an end-to-end MARL controller, a MARL controller tracking a fixed racing line, and an LQNG controller tracking a fixed racing line. Quantitative results show that the proposed hierarchical methods outperform their respective baseline methods in terms of head-to-head race wins and abiding by the rules. The hierarchical controller using MARL for low-level control consistently outperformed all other methods by winning over 90% of head-to-head races and more consistently adhered to the complex racing rules. Qualitatively, we observe the proposed controllers mimicking actions performed by expert human drivers such as shielding/blocking, overtaking, and long-term planning for delayed advantages. We show that hierarchical planning for game-theoretic reasoning produces competitive behavior even when challenged with complex rules and constraints.

SYSep 16, 2021
Back to the Future: Efficient, Time-Consistent Solutions in Reach-Avoid Games

Dennis R. Anthony, Duy P. Nguyen, David Fridovich-Keil et al.

We study the class of reach-avoid dynamic games in which multiple agents interact noncooperatively, and each wishes to satisfy a distinct target criterion while avoiding a failure criterion. Reach-avoid games are commonly used to express safety-critical optimal control problems found in mobile robot motion planning. Here, we focus on finding time-consistent solutions, in which future motion plans remain optimal even when a robot diverges from the plan early on due to, e.g., intrinsic dynamic uncertainty or extrinsic environment disturbances. Our main contribution is a computationally-efficient algorithm for multi-agent reach-avoid games which renders time-consistent solutions for all players. We demonstrate our approach in two- and three-player simulated driving scenarios, in which our method provides safe control strategies for all agents.

ROJun 7, 2021
Inferring Objectives in Continuous Dynamic Games from Noise-Corrupted Partial State Observations

Lasse Peters, David Fridovich-Keil, Vicenç Rubies-Royo et al.

Robots and autonomous systems must interact with one another and their environment to provide high-quality services to their users. Dynamic game theory provides an expressive theoretical framework for modeling scenarios involving multiple agents with differing objectives interacting over time. A core challenge when formulating a dynamic game is designing objectives for each agent that capture desired behavior. In this paper, we propose a method for inferring parametric objective models of multiple agents based on observed interactions. Our inverse game solver jointly optimizes player objectives and continuous-state estimates by coupling them through Nash equilibrium constraints. Hence, our method is able to directly maximize the observation likelihood rather than other non-probabilistic surrogate criteria. Our method does not require full observations of game states or player strategies to identify player objectives. Instead, it robustly recovers this information from noisy, partial state observations. As a byproduct of estimating player objectives, our method computes a Nash equilibrium trajectory corresponding to those objectives. Thus, it is suitable for downstream trajectory forecasting tasks. We demonstrate our method in several simulated traffic scenarios. Results show that it reliably estimates player objectives from a short sequence of noise-corrupted partial state observations. Furthermore, using the estimated objectives, our method makes accurate predictions of each player's trajectory.

OCJan 8, 2021
The Computation of Approximate Generalized Feedback Nash Equilibria

Forrest Laine, David Fridovich-Keil, Chih-Yuan Chiu et al.

We present the concept of a Generalized Feedback Nash Equilibrium (GFNE) in dynamic games, extending the Feedback Nash Equilibrium concept to games in which players are subject to state and input constraints. We formalize necessary and sufficient conditions for (local) GFNE solutions at the trajectory level, which enable the development of efficient numerical methods for their computation. Specifically, we propose a Newton-style method for finding game trajectories which satisfy necessary conditions for an equilibrium, which can then be checked against sufficiency conditions. We show that the evaluation of the necessary conditions in general requires computing a series of nested, implicitly-defined derivatives, which quickly becomes intractable. To this end, we introduce an approximation to the necessary conditions which is amenable to efficient evaluation, and in turn, computation of solutions. We term the solutions to the approximate necessary conditions Generalized Feedback Quasi-Nash Equilibria (GFQNE), and we introduce numerical methods for their computation. In particular, we develop a Sequential Linear-Quadratic Game approach, in which a LQ local approximation of the game is solved at each iteration. The development of this method relies on the ability to compute a GFNE to inequality- and equality-constrained LQ games, and therefore specific methods for the solution of these special cases are developed in detail. We demonstrate the effectiveness of the proposed solution approach on a dynamic game arising in an autonomous driving application.

RONov 11, 2020
Multi-Hypothesis Interactions in Game-Theoretic Motion Planning

Forrest Laine, David Fridovich-Keil, Chih-Yuan Chiu et al.

We present a novel method for handling uncertainty about the intentions of non-ego players in dynamic games, with application to motion planning for autonomous vehicles. Equilibria in these games explicitly account for interaction among other agents in the environment, such as drivers and pedestrians. Our method models the uncertainty about the intention of other agents by constructing multiple hypotheses about the objectives and constraints of other agents in the scene. For each candidate hypothesis, we associate a Bernoulli random variable representing the probability of that hypothesis, which may or may not be independent of the probability of other hypotheses. We leverage constraint asymmetries and feedback information patterns to incorporate the probabilities of hypotheses in a natural way. Specifically, increasing the probability associated with a given hypothesis from $0$ to $1$ shifts the responsibility of collision avoidance from the hypothesized agent to the ego agent. This method allows the generation of interactive trajectories for the ego agent, where the level of assertiveness or caution that the ego exhibits is directly related to the easy-to-model uncertainty it maintains about the scene.

RONov 9, 2020
Encoding Defensive Driving as a Dynamic Nash Game

Chih-Yuan Chiu, David Fridovich-Keil, Claire J. Tomlin

Robots deployed in real-world environments should operate safely in a robust manner. In scenarios where an "ego" agent navigates in an environment with multiple other "non-ego" agents, two modes of safety are commonly proposed -- adversarial robustness and probabilistic constraint satisfaction. However, while the former is generally computationally intractable and leads to overconservative solutions, the latter typically relies on strong distributional assumptions and ignores strategic coupling between agents. To avoid these drawbacks, we present a novel formulation of robustness within the framework of general-sum dynamic game theory, modeled on defensive driving. More precisely, we prepend an adversarial phase to the ego agent's cost function. That is, we prepend a time interval during which other agents are assumed to be temporarily distracted, in order to render the ego agent's equilibrium trajectory robust against other agents' potentially dangerous behavior during this time. We demonstrate the effectiveness of our new formulation in encoding safety via multiple traffic scenarios.

SYNov 1, 2020
Approximate Solutions to a Class of Reachability Games

David Fridovich-Keil, Claire J. Tomlin

In this paper, we present a method for finding approximate Nash equilibria in a broad class of reachability games. These games are often used to formulate both collision avoidance and goal satisfaction. Our method is computationally efficient, running in real-time for scenarios involving multiple players and more than ten state dimensions. The proposed approach forms a family of increasingly exact approximations to the original game. Our results characterize the quality of these approximations and show operation in a receding horizon, minimally-invasive control context. Additionally, as a special case, our method reduces to local gradient-based optimization in the single-player (optimal control) setting, for which a wide variety of efficient algorithms exist.

LGApr 6, 2020
Technical Report: Adaptive Control for Linearizable Systems Using On-Policy Reinforcement Learning

Tyler Westenbroek, Eric Mazumdar, David Fridovich-Keil et al.

This paper proposes a framework for adaptively learning a feedback linearization-based tracking controller for an unknown system using discrete-time model-free policy-gradient parameter update rules. The primary advantage of the scheme over standard model-reference adaptive control techniques is that it does not require the learned inverse model to be invertible at all instances of time. This enables the use of general function approximators to approximate the linearizing controller for the system without having to worry about singularities. However, the discrete-time and stochastic nature of these algorithms precludes the direct application of standard machinery from the adaptive control literature to provide deterministic stability proofs for the system. Nevertheless, we leverage these techniques alongside tools from the stochastic approximation literature to demonstrate that with high probability the tracking and parameter errors concentrate near zero when a certain persistence of excitation condition is satisfied. A simulated example of a double pendulum demonstrates the utility of the proposed theory. 1

ROFeb 11, 2020
Inference-Based Strategy Alignment for General-Sum Differential Games

Lasse Peters, David Fridovich-Keil, Claire J. Tomlin et al.

In many settings where multiple agents interact, the optimal choices for each agent depend heavily on the choices of the others. These coupled interactions are well-described by a general-sum differential game, in which players have differing objectives, the state evolves in continuous time, and optimal play may be characterized by one of many equilibrium concepts, e.g., a Nash equilibrium. Often, problems admit multiple equilibria. From the perspective of a single agent in such a game, this multiplicity of solutions can introduce uncertainty about how other agents will behave. This paper proposes a general framework for resolving ambiguity between equilibria by reasoning about the equilibrium other agents are aiming for. We demonstrate this framework in simulations of a multi-player human-robot navigation problem that yields two main conclusions: First, by inferring which equilibrium humans are operating at, the robot is able to predict trajectories more accurately, and second, by discovering and aligning itself to this equilibrium the robot is able to reduce the cost for all players.

OCOct 29, 2019
Feedback Linearization for Unknown Systems via Reinforcement Learning

Tyler Westenbroek, David Fridovich-Keil, Eric Mazumdar et al.

We present a novel approach to control design for nonlinear systems which leverages model-free policy optimization techniques to learn a linearizing controller for a physical plant with unknown dynamics. Feedback linearization is a technique from nonlinear control which renders the input-output dynamics of a nonlinear plant \emph{linear} under application of an appropriate feedback controller. Once a linearizing controller has been constructed, desired output trajectories for the nonlinear plant can be tracked using a variety of linear control techniques. However, the calculation of a linearizing controller requires a precise dynamics model for the system. As a result, model-based approaches for learning exact linearizing controllers generally require a simple, highly structured model of the system with easily identifiable parameters. In contrast, the model-free approach presented in this paper is able to approximate the linearizing controller for the plant using general function approximation architectures. Specifically, we formulate a continuous-time optimization problem over the parameters of a learned linearizing controller whose optima are the set of parameters which best linearize the plant. We derive conditions under which the learning problem is (strongly) convex and provide guarantees which ensure the true linearizing controller for the plant is recovered. We then discuss how model-free policy optimization algorithms can be used to solve a discrete-time approximation to the problem using data collected from the real-world plant. The utility of the framework is demonstrated in simulation and on a real-world robotic platform.