ROSep 27, 2024
CurricuLLM: Automatic Task Curricula Design for Learning Complex Robot Skills using Large Language ModelsKanghyun Ryu, Qiayuan Liao, Zhongyu Li et al.
Curriculum learning is a training mechanism in reinforcement learning (RL) that facilitates the achievement of complex policies by progressively increasing the task difficulty during training. However, designing effective curricula for a specific task often requires extensive domain knowledge and human intervention, which limits its applicability across various domains. Our core idea is that large language models (LLMs), with their extensive training on diverse language data and ability to encapsulate world knowledge, present significant potential for efficiently breaking down tasks and decomposing skills across various robotics environments. Additionally, the demonstrated success of LLMs in translating natural language into executable code for RL agents strengthens their role in generating task curricula. In this work, we propose CurricuLLM, which leverages the high-level planning and programming capabilities of LLMs for curriculum design, thereby enhancing the efficient learning of complex target tasks. CurricuLLM consists of: (Step 1) Generating sequence of subtasks that aid target task learning in natural language form, (Step 2) Translating natural language description of subtasks in executable task code, including the reward code and goal distribution code, and (Step 3) Evaluating trained policies based on trajectory rollout and subtask description. We evaluate CurricuLLM in various robotics simulation environments, ranging from manipulation, navigation, and locomotion, to show that CurricuLLM can aid learning complex robot control tasks. In addition, we validate humanoid locomotion policy learned through CurricuLLM in real-world. Project website is https://iconlab.negarmehr.com/CurricuLLM/
GTAug 15, 2023
Active Inverse Learning in Stackelberg Trajectory GamesWilliam Ward, Yue Yu, Jacob Levy et al.
Game-theoretic inverse learning is the problem of inferring a player's objectives from their actions. We formulate an inverse learning problem in a Stackelberg game between a leader and a follower, where each player's action is the trajectory of a dynamical system. We propose an active inverse learning method for the leader to infer which hypothesis among a finite set of candidates best describes the follower's objective function. Instead of using passively observed trajectories like existing methods, we actively maximize the differences in the follower's trajectories under different hypotheses by optimizing the leader's control inputs. Compared with uniformly random inputs, the optimized inputs accelerate the convergence of the estimated probability of different hypotheses conditioned on the follower's trajectory. We demonstrate the proposed method in a receding-horizon repeated trajectory game and simulate the results using virtual TurtleBots in Gazebo.
ROMay 13
MIMIC-D: Multi-modal Imitation for MultI-agent Coordination with Decentralized Diffusion PoliciesDayi Dong, Maulik Bhatt, Seoyeon Choi et al.
As robots become more integrated in society, their ability to coordinate with other robots and humans on multi-modal tasks (those with multiple valid solutions) is crucial. Such behaviors can be learned from expert demonstrations via imitation learning (IL), but when expert demonstrations are multi-modal, standard IL approaches usually average across modes or collapse to a single mode, preventing effective coordination. Being inspired by diffusion models' ability to capture complex multi-modal trajectory distributions in single-agent settings, we develop a diffusion-based framework for coordinated multi-modal behavior in multi-agent systems. However, existing multi-agent diffusion approaches typically require a centralized planner or explicit communication among agents. This assumption can fail in real-world scenarios where robots must operate independently or with agents like humans that they cannot directly communicate with. Therefore, we propose MIMIC-D, a joint training with decentralized execution paradigm for multi-modal multi-agent IL via diffusion. We jointly train all agents' policies with only local information to achieve implicit coordination. In simulation and hardware experiments, our method exhibits robust multi-modal coordination behavior in various tasks and environments, improving upon state-of-the-art baselines.
GTSep 30, 2023
When Should a Leader Act Suboptimally? The Role of Inferability in Repeated Stackelberg GamesMustafa O. Karabag, Sophia Smith, Negar Mehr et al.
When interacting with other decision-making agents in non-adversarial scenarios, it is critical for an autonomous agent to have inferable behavior: The agent's actions must convey their intention and strategy. We model the inferability problem using Stackelberg games with observations where a leader and a follower repeatedly interact. During the interactions, the leader uses a fixed mixed strategy. The follower does not know the leader's strategy and dynamically reacts to the statistically inferred strategy based on the leader's previous actions. In the inference setting, the leader may have a lower performance compared to the setting where the follower has full information on the leader's strategy. We refer to the performance gap between these settings as the inferability gap. For a variety of game settings, we show that the inferability gap is upper-bounded by a function of the number of interactions and the stochasticity level of the leader's strategy, encouraging the use of inferable strategies with lower stochasticity levels. We also analyze bimatrix Stackelberg games and identify a set of games where the leader's near-optimal strategy may potentially suffer from a large inferability gap.
LGFeb 24
Matching Multiple Experts: On the Exploitability of Multi-Agent Imitation LearningAntoine Bergerault, Volkan Cevher, Negar Mehr
Multi-agent imitation learning (MA-IL) aims to learn optimal policies from expert demonstrations of interactions in multi-agent interactive domains. Despite existing guarantees on the performance of the resulting learned policies, characterizations of how far the learned polices are from a Nash equilibrium are missing for offline MA-IL. In this paper, we demonstrate impossibility and hardness results of learning low-exploitable policies in general $n$-player Markov Games. We do so by providing examples where even exact measure matching fails, and demonstrating a new hardness result on characterizing the Nash gap given a fixed measure matching error. We then show how these challenges can be overcome using strategic dominance assumptions on the expert equilibrium. Specifically, for the case of dominant strategy expert equilibria, assuming Behavioral Cloning error $ε_{\text{BC}}$, this provides a Nash imitation gap of $\mathcal{O}\left(nε_{\text{BC}}/(1-γ)^2\right)$ for a discount factor $γ$. We generalize this result with a new notion of best-response continuity, and argue that this is implicitly encouraged by standard regularization techniques.
ROMay 16
TACO: Temporal Consensus Optimization for Continual Neural MappingXunlan Zhou, Hongrui Zhao, Negar Mehr
Neural implicit mapping has emerged as a powerful paradigm for robotic navigation and scene understanding. However, real-world robotic deployment requires continual adaptation to changing environments under strict memory and computation constraints, which existing mapping systems fail to support. Most prior methods rely on replaying historical observations to preserve consistency and assume static scenes. As a result, they cannot adapt to continual learning in dynamic robotic settings. To address these challenges, we propose TACO (TemporAl Consensus Optimization), a replay-free framework for continual neural mapping. We reformulate mapping as a temporal consensus optimization problem, where we treat past model snapshots as temporal neighbors. Intuitively, our approach resembles a model consulting its own past knowledge. We update the current map by enforcing weighted consensus with historical representations. Our method allows reliable past geometry to constrain optimization while permitting unreliable or outdated regions to be revised in response to new observations. TACO achieves a balance between memory efficiency and adaptability without storing or replaying previous data. Through extensive simulated and real-world experiments, we show that TACO robustly adapts to scene changes, and consistently outperforms other continual learning baselines.
ROApr 24
Learning Control Policies to Provably Satisfy Hard Affine Constraints for Black-Box Hybrid Dynamical SystemsAayushi Shrivastava, Kartik Nagpal, Sairam Jinkala et al.
Ensuring safety for black-box hybrid dynamical systems presents significant challenges due to their instantaneous state jumps and unknown explicit nonlinear dynamics. Existing solutions for strict safety constraint satisfaction, like control barrier functions (CBFs) and reachability analysis, rely on direct knowledge of the dynamics. Similarly, safe reinforcement learning (RL) approaches often rely on known system dynamics or merely discourage safety violations through reward shaping. In this work, we want to learn RL policies which provably satisfy affine state constraints in closed loop for black-box hybrid dynamical systems with affine reset maps. Our key insight is forcing the RL policy to be affine and repulsive near the constraint boundaries for the unknown nonlinear dynamics of the system, providing guarantees that the trajectories will not violate the constraint. We further account for constraint violation due to instantaneous state jumps that occur due to impacts or reset maps in the hybrid system by introducing a second repulsive affine region before the reset that prevents post-reset states from violating the constraint. We derive sufficient conditions under which these policies satisfy safety constraints in closed loop. We also compare our approach with state-of-the-art reward shaping and learned-CBF methods on hybrid dynamical systems like the constrained pendulum and paddle juggler environments. In both scenarios, we show that our methodology learns higher quality policies while always satisfying the safety constraints.
ROSep 30, 2024
Distributed NeRF Learning for Collaborative Multi-Robot PerceptionHongrui Zhao, Boris Ivanovic, Negar Mehr
Effective environment perception is crucial for enabling downstream robotic applications. Individual robotic agents often face occlusion and limited visibility issues, whereas multi-agent systems can offer a more comprehensive mapping of the environment, quicker coverage, and increased fault tolerance. In this paper, we propose a collaborative multi-agent perception system where agents collectively learn a neural radiance field (NeRF) from posed RGB images to represent a scene. Each agent processes its local sensory data and shares only its learned NeRF model with other agents, reducing communication overhead. Given NeRF's low memory footprint, this approach is well-suited for robotic systems with limited bandwidth, where transmitting all raw data is impractical. Our distributed learning framework ensures consistency across agents' local NeRF models, enabling convergence to a unified scene representation. We show the effectiveness of our method through an extensive set of experiments on datasets containing challenging real-world scenes, achieving performance comparable to centralized mapping of the environment where data is sent to a central server for processing. Additionally, we find that multi-agent learning provides regularization benefits, improving geometric consistency in scenarios with sparse input views. We show that in such scenarios, multi-agent mapping can even outperform centralized training.
ROMar 19
UDON: Uncertainty-weighted Distributed Optimization for Multi-Robot Neural Implicit Mapping under Extreme Communication ConstraintsHongrui Zhao, Xunlan Zhou, Boris Ivanovic et al.
Multi-robot mapping with neural implicit representations enables the compact reconstruction of complex environments. However, it demands robustness against communication challenges like packet loss and limited bandwidth. While prior works have introduced various mechanisms to mitigate communication disruptions, performance degradation still occurs under extremely low communication success rates. This paper presents UDON, a real-time multi-agent neural implicit mapping framework that introduces a novel uncertainty-weighted distributed optimization to achieve high-quality mapping under severe communication deterioration. The uncertainty weighting prioritizes more reliable portions of the map, while the distributed optimization isolates and penalizes mapping disagreement between individual pairs of communicating agents. We conduct extensive experiments on standard benchmark datasets and real-world robot hardware. We demonstrate that UDON significantly outperforms existing baselines, maintaining high-fidelity reconstructions and consistent scene representations even under extreme communication degradation (as low as 1% success rate).
MAFeb 24, 2025
Leveraging Large Language Models for Effective and Explainable Multi-Agent Credit AssignmentKartik Nagpal, Dayi Dong, Jean-Baptiste Bouvier et al.
Recent work, spanning from autonomous vehicle coordination to in-space assembly, has shown the importance of learning collaborative behavior for enabling robots to achieve shared goals. A common approach for learning this cooperative behavior is to utilize the centralized-training decentralized-execution paradigm. However, this approach also introduces a new challenge: how do we evaluate the contributions of each agent's actions to the overall success or failure of the team. This credit assignment problem has remained open, and has been extensively studied in the Multi-Agent Reinforcement Learning literature. In fact, humans manually inspecting agent behavior often generate better credit evaluations than existing methods. We combine this observation with recent works which show Large Language Models demonstrate human-level performance at many pattern recognition tasks. Our key idea is to reformulate credit assignment to the two pattern recognition problems of sequence improvement and attribution, which motivates our novel LLM-MCA method. Our approach utilizes a centralized LLM reward-critic which numerically decomposes the environment reward based on the individualized contribution of each agent in the scenario. We then update the agents' policy networks based on this feedback. We also propose an extension LLM-TACA where our LLM critic performs explicit task assignment by passing an intermediary goal directly to each agent policy in the scenario. Both our methods far outperform the state-of-the-art on a variety of benchmarks, including Level-Based Foraging, Robotic Warehouse, and our new Spaceworld benchmark which incorporates collision-related safety constraints. As an artifact of our methods, we generate large trajectory datasets with each timestep annotated with per-agent reward information, as sampled from our LLM critics.
RODec 6, 2023
Weathering Ongoing Uncertainty: Learning and Planning in a Time-Varying Partially Observable EnvironmentGokul Puthumanaillam, Xiangyu Liu, Negar Mehr et al.
Optimal decision-making presents a significant challenge for autonomous systems operating in uncertain, stochastic and time-varying environments. Environmental variability over time can significantly impact the system's optimal decision making strategy for mission completion. To model such environments, our work combines the previous notion of Time-Varying Markov Decision Processes (TVMDP) with partial observability and introduces Time-Varying Partially Observable Markov Decision Processes (TV-POMDP). We propose a two-pronged approach to accurately estimate and plan within the TV-POMDP: 1) Memory Prioritized State Estimation (MPSE), which leverages weighted memory to provide more accurate time-varying transition estimates; and 2) an MPSE-integrated planning strategy that optimizes long-term rewards while accounting for temporal constraint. We validate the proposed framework and algorithms using simulations and hardware, with robots exploring a partially observable, time-varying environments. Our results demonstrate superior performance over standard methods, highlighting the framework's effectiveness in stochastic, uncertain, time-varying domains.
ROMay 23, 2024
Adaptive Teaching in Heterogeneous Agents: Balancing Surprise in Sparse Reward ScenariosEmma Clark, Kanghyun Ryu, Negar Mehr
Learning from Demonstration (LfD) can be an efficient way to train systems with analogous agents by enabling ``Student'' agents to learn from the demonstrations of the most experienced ``Teacher'' agent, instead of training their policy in parallel. However, when there are discrepancies in agent capabilities, such as divergent actuator power or joint angle constraints, naively replicating demonstrations that are out of bounds for the Student's capability can limit efficient learning. We present a Teacher-Student learning framework specifically tailored to address the challenge of heterogeneity between the Teacher and Student agents. Our framework is based on the concept of ``surprise'', inspired by its application in exploration incentivization in sparse-reward environments. Surprise is repurposed to enable the Teacher to detect and adapt to differences between itself and the Student. By focusing on maximizing its surprise in response to the environment while concurrently minimizing the Student's surprise in response to the demonstrations, the Teacher agent can effectively tailor its demonstrations to the Student's specific capabilities and constraints. We validate our method by demonstrating improvements in the Student's learning in control tasks within sparse-reward environments.
LGDec 11, 2021
Learning Contraction Policies from Offline DataNavid Rezazadeh, Maxwell Kolarich, Solmaz S. Kia et al.
This paper proposes a data-driven method for learning convergent control policies from offline data using Contraction theory. Contraction theory enables constructing a policy that makes the closed-loop system trajectories inherently convergent towards a unique trajectory. At the technical level, identifying the contraction metric, which is the distance metric with respect to which a robot's trajectories exhibit contraction is often non-trivial. We propose to jointly learn the control policy and its corresponding contraction metric while enforcing contraction. To achieve this, we learn an implicit dynamics model of the robotic system from an offline data set consisting of the robot's state and input trajectories. Using this learned dynamics model, we propose a data augmentation algorithm for learning contraction policies. We randomly generate samples in the state-space and propagate them forward in time through the learned dynamics model to generate auxiliary sample trajectories. We then learn both the control policy and the contraction metric such that the distance between the trajectories from the offline data set and our generated auxiliary sample trajectories decreases over time. We evaluate the performance of our proposed framework on simulated robotic goal-reaching tasks and demonstrate that enforcing contraction results in faster convergence and greater robustness of the learned policy.
OCOct 3, 2021
Maximum-Entropy Multi-Agent Dynamic Games: Forward and Inverse SolutionsNegar Mehr, Mingyu Wang, Mac Schwager
In this paper, we study the problem of multiple stochastic agents interacting in a dynamic game scenario with continuous state and action spaces. We define a new notion of stochastic Nash equilibrium for boundedly rational agents, which we call the Entropic Cost Equilibrium (ECE). We show that ECE is a natural extension to multiple agents of Maximum Entropy optimality for single agents. We solve both the "forward" and "inverse" problems for the multi-agent ECE game. For the forward problem, we provide a Riccati algorithm to compute closed-form ECE feedback policies for the agents, which are exact in the Linear-Quadratic-Gaussian case. We give an iterative variant to find locally ECE feedback policies for the nonlinear case. For the inverse problem, we present an algorithm to infer the cost functions of the multiple interacting agents given noisy, boundedly rational input and state trajectory examples from agents acting in an ECE. The effectiveness of our algorithms is demonstrated in a simulated multi-agent collision avoidance scenario, and with data from the INTERACTION traffic dataset. In both cases, we show that, by taking into account the agents' game theoretic interactions using our algorithm, a more accurate model of agents' costs can be learned, compared with standard inverse optimal control methods.
MASep 29, 2021
Decentralized Role Assignment in Multi-Agent Teams via Empirical Game-Theoretic AnalysisFengjun Yang, Negar Mehr, Mac Schwager
We propose a method, based on empirical game theory, for a robot operating as part of a team to choose its role within the team without explicitly communicating with team members, by leveraging its knowledge about the team structure. To do this, we formulate the role assignment problem as a dynamic game, and borrow tools from empirical game-theoretic analysis to analyze such games. Based on this game-theoretic formulation, we propose a distributed controller for each robot to dynamically decide on the best role to take. We demonstrate our method in simulations of a collaborative planar manipulation scenario in which each agent chooses from a set of feedback control policies at each instant. The agents can effectively collaborate without communication to manipulate the object while also avoiding collisions using our method.
ROJul 10, 2021
Potential iLQR: A Potential-Minimizing Controller for Planning Multi-Agent Interactive TrajectoriesTalha Kavuncu, Ayberk Yaraneri, Negar Mehr
Many robotic applications involve interactions between multiple agents where an agent's decisions affect the behavior of other agents. Such behaviors can be captured by the equilibria of differential games which provide an expressive framework for modeling the agents' mutual influence. However, finding the equilibria of differential games is in general challenging as it involves solving a set of coupled optimal control problems. In this work, we propose to leverage the special structure of multi-agent interactions to generate interactive trajectories by simply solving a single optimal control problem, namely, the optimal control problem associated with minimizing the potential function of the differential game. Our key insight is that for a certain class of multi-agent interactions, the underlying differential game is indeed a potential differential game for which equilibria can be found by solving a single optimal control problem. We introduce such an optimal control problem and build on single-agent trajectory optimization methods to develop a computationally tractable and scalable algorithm for planning multi-agent interactive trajectories. We will demonstrate the performance of our algorithm in simulation and show that our algorithm outperforms the state-of-the-art game solvers. To further show the real-time capabilities of our algorithm, we will demonstrate the application of our proposed algorithm in a set of experiments involving interactive trajectories for two quadcopters.
ROOct 16, 2020
RAT iLQR: A Risk Auto-Tuning Controller to Optimally Account for Stochastic Model MismatchHaruki Nishimura, Negar Mehr, Adrien Gaidon et al.
Successful robotic operation in stochastic environments relies on accurate characterization of the underlying probability distributions, yet this is often imperfect due to limited knowledge. This work presents a control algorithm that is capable of handling such distributional mismatches. Specifically, we propose a novel nonlinear MPC for distributionally robust control, which plans locally optimal feedback policies against a worst-case distribution within a given KL divergence bound from a Gaussian distribution. Leveraging mathematical equivalence between distributionally robust control and risk-sensitive optimal control, our framework also provides an algorithm to dynamically adjust the risk-sensitivity level online for risk-sensitive control. The benefits of the distributional robustness as well as the automatic risk-sensitivity adjustment are demonstrated in a dynamic collision avoidance scenario where the predictive distribution of human motion is erroneous.
GTApr 2, 2019
Pricing Traffic Networks with Mixed Vehicle AutonomyNegar Mehr, Roberto Horowitz
In a traffic network, vehicles normally select their routes selfishly. Consequently, traffic networks normally operate at an equilibrium characterized by Wardrop conditions. However, it is well known that equilibria are inefficient in general. In addition to the intrinsic inefficiency of equilibria, the authors recently showed that, in mixed-autonomy networks in which autonomous vehicles maintain a shorter headway than human-driven cars, increasing the fraction of autonomous vehicles in the network may increase the inefficiency of equilibria. In this work, we study the possibility of obviating the inefficiency of equilibria in mixed-autonomy traffic networks via pricing mechanisms. In particular, we study assigning prices to network links such that the overall or social delay of the resulting equilibria is minimum. First, we study the possibility of inducing such optimal equilibria by imposing a set of undifferentiated prices, i.e. a set of prices that treat both human-driven and autonomous vehicles similarly at each link. We provide an example which demonstrates that undifferentiated pricing is not sufficient for achieving minimum social delay. Then, we study differentiated pricing where the price of traversing each link may depend on whether vehicles are human-driven or autonomous. Under differentiated pricing, we prove that link prices obtained from the marginal cost taxation of links will induce equilibria with minimum social delay if the degree of road capacity asymmetry (i.e. the ratio between the road capacity when all vehicles are human-driven and the road capacity when all vehicles are autonomous) is homogeneous among network links.
GTSep 8, 2018
A Game Theoretic Macroscopic Model of Bypassing at Traffic Diverges with Applications to Mixed Autonomy NetworksNegar Mehr, Ruolin Li, Roberto Horowitz
Vehicle bypassing is known to negatively affect delays at traffic diverges. However, due to the complexities of this phenomenon, accurate and yet simple models of such lane change maneuvers are hard to develop. In this work, we present a macroscopic model for predicting the number of vehicles that bypass at a traffic diverge. We take into account the selfishness of vehicles in selecting their lanes; every vehicle selects lanes such that its own cost is minimized. We discuss how we model the costs experienced by the vehicles. Then, taking into account the selfish behavior of the vehicles, we model the lane choice of vehicles at a traffic diverge as a Wardrop equilibrium. We state and prove the properties of Wardrop equilibrium in our model. We show that there always exists an equilibrium for our model. Moreover, unlike most nonlinear asymmetrical routing games, we prove that the equilibrium is unique under mild assumptions. We discuss how our model can be easily calibrated by running a simple optimization problem. Using our calibrated model, we validate it through simulation studies and demonstrate that our model successfully predicts the aggregate lane change maneuvers that are performed by vehicles for bypassing at a traffic diverge. We further discuss how our model can be employed to obtain the optimal lane choice behavior of the vehicles, where the social or total cost of vehicles is minimized. Finally, we demonstrate how our model can be utilized in scenarios where a central authority can dictate the lane choice and trajectory of certain vehicles so as to increase the overall vehicle mobility at a traffic diverge. Examples of such scenarios include the case when both human driven and autonomous vehicles coexist in the network. We show how certain decisions of the central authority can affect the total delays in such scenarios via an example.