92.7LGMar 12Code
Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement LearningJiaheng Hu, Jay Shim, Chen Tang et al.
Continual Reinforcement Learning (CRL) for Vision-Language-Action (VLA) models is a promising direction toward self-improving embodied agents that can adapt in openended, evolving environments. However, conventional wisdom from continual learning suggests that naive Sequential Fine-Tuning (Seq. FT) leads to catastrophic forgetting, necessitating complex CRL strategies. In this work, we take a step back and conduct a systematic study of CRL for large pretrained VLAs across three models and five challenging lifelong RL benchmarks. We find that, contrary to established belief, simple Seq. FT with low-rank adaptation (LoRA) is remarkably strong: it achieves high plasticity, exhibits little to no forgetting, and retains strong zero-shot generalization, frequently outperforming more sophisticated CRL methods. Through detailed analysis, we show that this robustness arises from a synergy between the large pretrained model, parameter-efficient adaptation, and on-policy RL. Together, these components reshape the stability-plasticity trade-off, making continual adaptation both stable and scalable. Our results position Sequential Fine-Tuning as a powerful method for continual RL with VLAs and provide new insights into lifelong learning in the large model era. Code is available at github.com/UT-Austin-RobIn/continual-vla-rl.
AIDec 3, 2025
Evaluating Generalization Capabilities of LLM-Based Agents in Mixed-Motive Scenarios Using ConcordiaChandler Smith, Marwa Abdulhai, Manfred Diaz et al.
Large Language Model (LLM) agents have demonstrated impressive capabilities for social interaction and are increasingly being deployed in situations where they might engage with both human and artificial agents. These interactions represent a critical frontier for LLM-based agents, yet existing evaluation methods fail to measure how well these capabilities generalize to novel social situations. In this paper, we introduce a method for evaluating the ability of LLM-based agents to cooperate in zero-shot, mixed-motive environments using Concordia, a natural language multi-agent simulation environment. Our method measures general cooperative intelligence by testing an agent's ability to identify and exploit opportunities for mutual gain across diverse partners and contexts. We present empirical results from the NeurIPS 2024 Concordia Contest, where agents were evaluated on their ability to achieve mutual gains across a suite of diverse scenarios ranging from negotiation to collective action problems. Our findings reveal significant gaps between current agent capabilities and the robust generalization required for reliable cooperation, particularly in scenarios demanding persuasion and norm enforcement.
ROMar 4
Large-Language-Model-Guided State Estimation for Partially Observable Task and Motion PlanningYoonwoo Kim, Raghav Arora, Roberto Martín-Martín et al.
Robot planning in partially observable environments, where not all objects are known or visible, is a challenging problem, as it requires reasoning under uncertainty through partially observable Markov decision processes. During the execution of a computed plan, a robot may unexpectedly observe task-irrelevant objects, which are typically ignored by naive planners. In this work, we propose incorporating two types of common-sense knowledge: (1) certain objects are more likely to be found in specific locations; and (2) similar objects are likely to be co-located, while dissimilar objects are less likely to be found together. Manually engineering such knowledge is complex, so we explore leveraging the powerful common-sense reasoning capabilities of large language models (LLMs). Our planning and execution framework, CoCo-TAMP, introduces a hierarchical state estimation that uses LLM-guided information to shape the belief over task-relevant objects, enabling efficient solutions to long-horizon task and motion planning problems. In experiments, CoCo-TAMP achieves an average reduction of 62.7 in planning and execution time in simulation, and 72.6 in real-world demonstrations, compared to a baseline that does not incorporate either type of common-sense knowledge.
AIMay 29, 2025
ROTATE: Regret-driven Open-ended Training for Ad Hoc TeamworkCaroline Wang, Arrasy Rahman, Jiaxun Cui et al.
Learning to collaborate with previously unseen partners is a fundamental generalization challenge in multi-agent learning, known as Ad Hoc Teamwork (AHT). Existing AHT approaches often adopt a two-stage pipeline, where first, a fixed population of teammates is generated with the idea that they should be representative of the teammates that will be seen at deployment time, and second, an AHT agent is trained to collaborate well with agents in the population. To date, the research community has focused on designing separate algorithms for each stage. This separation has led to algorithms that generate teammates with limited coverage of possible behaviors, and that ignore whether the generated teammates are easy to learn from for the AHT agent. Furthermore, algorithms for training AHT agents typically treat the set of training teammates as static, thus attempting to generalize to previously unseen partner agents without assuming any control over the set of training teammates. This paper presents a unified framework for AHT by reformulating the problem as an open-ended learning process between an AHT agent and an adversarial teammate generator. We introduce ROTATE, a regret-driven, open-ended training algorithm that alternates between improving the AHT agent and generating teammates that probe its deficiencies. Experiments across diverse two-player environments demonstrate that ROTATE significantly outperforms baselines at generalizing to an unseen set of evaluation teammates, thus establishing a new standard for robust and generalizable teamwork.
ROOct 19, 2021
Towards Optimal Correlational Object SearchKaiyu Zheng, Rohan Chitnis, Yoonchang Sung et al.
In realistic applications of object search, robots will need to locate target objects in complex environments while coping with unreliable sensors, especially for small or hard-to-detect objects. In such settings, correlational information can be valuable for planning efficiently. Previous approaches that consider correlational information typically resort to ad-hoc, greedy search strategies. We introduce the Correlational Object Search POMDP (COS-POMDP), which models correlations while preserving optimal solutions with a reduced state space. We propose a hierarchical planning algorithm to scale up COS-POMDPs for practical domains. Our evaluation, conducted with the AI2-THOR household simulator and the YOLOv5 object detector, shows that our method finds objects more successfully and efficiently compared to baselines,particularly for hard-to-detect objects such as srub brush and remote control.
ROMar 26, 2021
Reactive Task and Motion Planning under Temporal Logic SpecificationsShen Li, Daehyung Park, Yoonchang Sung et al.
We present a task-and-motion planning (TAMP) algorithm robust against a human operator's cooperative or adversarial interventions. Interventions often invalidate the current plan and require replanning on the fly. Replanning can be computationally expensive and often interrupts seamless task execution. We introduce a dynamically reconfigurable planning methodology with behavior tree-based control strategies toward reactive TAMP, which takes the advantage of previous plans and incremental graph search during temporal logic-based reactive synthesis. Our algorithm also shows efficient recovery functionalities that minimize the number of replanning steps. Finally, our algorithm produces a robust, efficient, and complete TAMP solution. Our experimental results show the algorithm results in superior manipulation performance in both simulated and real-world tasks.
ROMar 7, 2021
Learning When to Quit: Meta-Reasoning for Motion PlanningYoonchang Sung, Leslie Pack Kaelbling, Tomás Lozano-Pérez
Anytime motion planners are widely used in robotics. However, the relationship between their solution quality and computation time is not well understood, and thus, determining when to quit planning and start execution is unclear. In this paper, we address the problem of deciding when to stop deliberation under bounded computational capacity, so called meta-reasoning, for anytime motion planning. We propose data-driven learning methods, model-based and model-free meta-reasoning, that are applicable to different environment distributions and agnostic to the choice of anytime motion planners. As a part of the framework, we design a convolutional neural network-based optimal solution predictor that predicts the optimal path length from a given 2D workspace image. We empirically evaluate the performance of the proposed methods in simulation in comparison with baselines.
ROMay 6, 2020
Multi-Resolution POMDP Planning for Multi-Object Search in 3DKaiyu Zheng, Yoonchang Sung, George Konidaris et al.
Robots operating in households must find objects on shelves, under tables, and in cupboards. In such environments, it is crucial to search efficiently at 3D scale while coping with limited field of view and the complexity of searching for multiple objects. Principled approaches to object search frequently use Partially Observable Markov Decision Process (POMDP) as the underlying framework for computing search strategies, but constrain the search space in 2D. In this paper, we present a POMDP formulation for multi-object search in a 3D region with a frustum-shaped field-of-view. To efficiently solve this POMDP, we propose a multi-resolution planning algorithm based on online Monte-Carlo tree search. In this approach, we design a novel octree-based belief representation to capture uncertainty of the target objects at different resolution levels, then derive abstract POMDPs at lower resolutions with dramatically smaller state and observation spaces. Evaluation in a simulated 3D domain shows that our approach finds objects more efficiently and successfully compared to a set of baselines without resolution hierarchy in larger instances under the same computational requirement. We demonstrate our approach on a mobile robot to find objects placed at different heights in two 10m$^2 \times 2$m regions by moving its base and actuating its torso.
ROSep 18, 2019
Environmental Hotspot Identification in Limited Time with a UAV Equipped with a Downward-Facing CameraYoonchang Sung, Deeksha Dixit, Pratap Tokekar
Our work is motivated by environmental monitoring tasks, where finding the global maxima (i.e., hotspot) of a spatially varying field is crucial. We investigate the problem of identifying the hotspot for fields that can be sensed using an Unmanned Aerial Vehicle (UAV) equipped with a downward-facing camera. The UAV has a limited time budget which it can use for learning the unknown field and identifying the hotspot. Our contribution is to show how this problem can be formulated as a novel multi-fidelity variant of the Gaussian Process (GP) multi-armed bandit problem. The novelty is two-fold: (i) unlike standard multi-armed bandit settings, the rewards of the arms are correlated with each other; and (ii) unlike standard GP regression, the measurements in our problem are images (i.e., vector measurements) whose quality depends on the altitude of the UAV. We present a strategy for finding the sequence of UAV sensing locations and empirically compare it with several baselines. Experimental results using images gathered onboard a UAV are also presented and the scalability of the proposed methodology is assessed in a large-scale simulated environment in Gazebo.
RODec 23, 2018
GM-PHD Filter for Searching and Tracking an Unknown Number of Targets with a Mobile Sensor with Limited FOVYoonchang Sung, Pratap Tokekar
We study the problem of searching for and tracking a collection of moving targets using a robot with a limited Field-Of-View (FOV) sensor. The actual number of targets present in the environment is not known a priori. We propose a search and tracking framework based on the concept of Bayesian Random Finite Sets (RFSs). Specifically, we generalize the Gaussian Mixture Probability Hypothesis Density (GM-PHD) filter which was previously applied for tracking problems to allow for simultaneous search and tracking with a limited FOV sensor. The proposed framework can extract individual target tracks as well as estimate the number and the spatial density of targets. We also show how to use the Gaussian Process (GP) regression to extract and predict non-linear target trajectories in this framework. We demonstrate the efficacy of our techniques through representative simulations and a real data collected from an aerial robot.
RODec 22, 2018
Distributed Assignment with Limited Communication for Multi-Robot Multi-Target TrackingYoonchang Sung, Ashish Kumar Budhiraja, Ryan K. Williams et al.
We study the problem of tracking multiple moving targets using a team of mobile robots. Each robot has a set of motion primitives to choose from in order to collectively maximize the number of targets tracked or the total quality of tracking. Our focus is on scenarios where communication is limited and the robots have limited time to share information with their neighbors. As a result, we seek distributed algorithms that can find solutions in bounded amount of time. We present two algorithms: (1) a greedy algorithm that is guaranteed finds a $2$-approximation to the optimal (centralized) solution albeit requiring $|R|$ communication rounds in the worst-case, where $|R|$ denotes the number of robots; and (2) a local algorithm that finds a $\mathcal{O}\left((1+ε)(1+1/h)\right)$-approximation algorithm in $\mathcal{O}(h\log 1/ε)$ communication rounds. Here, $h$ and $ε$ are parameters that allow the user to trade-off the solution quality with communication time. In addition to theoretical results, we present empirical evaluation including comparisons with centralized optimal solutions.
RONov 7, 2018
Online Exploration of an Unknown Region of Interest with a Team of Aerial RobotsYoonchang Sung, Deeksha Dixit, Pratap Tokekar
In this paper, we study the problem of exploring an unknown Region Of Interest (ROI) with a team of aerial robots. The size and shape of the ROI are unknown to the robots. The objective is to find a tour for each robot such that each point in the ROI must be visible from the field-of-view of some robot along its tour. In conventional exploration using ground robots, the ROI boundary is typically also as an obstacle and robots are naturally constrained to the interior of this ROI. Instead, we study the case where aerial robots are not restricted to flying inside the ROI (and can fly over the boundary of the ROI). We propose a recursive depth-first search-based algorithm that yields a constant competitive ratio for the exploration problem. Our analysis also extends to the case where the ROI is translating, \eg, in the case of marine plumes. In the simpler version of the problem where the ROI is modeled as a 2D grid, the competitive ratio is $\frac{2(S_r+S_p)(R+\lfloor\log{R}\rfloor)}{(S_r-S_p)(1+\lfloor\log{R}\rfloor)}$ where $R$ is the number of robots, and $S_r$ and $S_p$ are the robot speed and the ROI speed, respectively. We also consider a more realistic scenario where the ROI shape is not restricted to grid cells but an arbitrary shape. We show our algorithm has $\frac{2(S_r+S_p)(18R+\lfloor\log{R}\rfloor)}{(S_r-S_p)(1+\lfloor\log{R}\rfloor)}$ competitive ratio under some conditions. We empirically verify our algorithm using simulations as well as a proof-of-concept experiment mapping a 2D ROI using an aerial robot with a downwards-facing camera.
ROJul 25, 2018
Tree Search Techniques for Minimizing Detectability and Maximizing VisibilityZhongshun Zhang, Yoonchang Sung, Lifeng Zhou et al.
We introduce and study the problem of planning a trajectory for an agent to carry out a scouting mission while avoiding being detected by an adversarial guard. This introduces an adversarial version of classical visibility-based planning problems such as the Watchman Route Problem. The agent receives a positive reward for increasing its visibility and a negative penalty when it is detected by the guard. The objective is to find a finite-horizon path for the agent that balances the trade-off maximizing visibility and minimizing detectability. We model this problem as a sequential two-player zero-sum discrete game. A minimax tree search can give the optimal policy for the agent but requires an exponential-time computation and space. We propose several pruning techniques to reduce the computational cost while still preserving optimality guarantees. Simulation results show that the proposed strategy prunes approximately three orders of magnitude nodes as compared to the brute-force strategy.
ROJun 7, 2017
Distributed Simultaneous Action and Target Assignment for Multi-Robot Multi-Target TrackingYoonchang Sung, Ashish Kumar Budhiraja, Ryan K. Williams et al.
We study a multi-robot assignment problem for multi-target tracking. The proposed problem can be viewed as the mixed packing and covering problem. To deal with a limitation on both sensing and communication ranges, a distributed approach is taken into consideration. A local algorithm gives theoretical bounds on both the running time and approximation ratio to an optimal solution. We employ a local algorithm of max-min linear programs to solve the proposed task. Simulation result shows that a local algorithm is an effective solution to the multi-robot task allocation.