74.2LGMay 29
Convergence of Two-Timescale Markovian Stochastic Approximations with Applications in Reinforcement LearningVagul Mahadevan, Claire Chen, Shuze Daniel Liu et al.
This work studies the convergence of two-timescale stochastic approximations (SA), a class of iterative algorithms that update two sets of parameters in fast and slow timescales respectively. Notable examples of two-timescale SA in reinforcement learning (RL) include temporal difference learning with gradient correction (TDC) and actor-critic methods. Previously, the stability (i.e., boundedness) and convergence of two-timescale SA were only established under i.i.d. noise. This work instead establishes the stability and convergence of two-timescale SA under Markovian noise, a setup that is more realistic in RL. Notably, we do not need to use any projection operator and the noise does not need to live in a compact space. Our key technical novelty is to control the fast timescale parameter with the running max of the slow timescale parameter, instead of with the current slow timescale parameter, as most prior works do. As a key application, we establish the first almost sure convergence of TDC with eligibility traces under off-policy learning with linear function approximation.
44.2ROMay 9
GameChat: Multi-LLM Dialogue for Safe, Agile, and Socially Optimal Multi-Agent Navigation in Constrained EnvironmentsVagul Mahadevan, Shangtong Zhang, Rohan Chandra
Safe, agile, and socially compliant multi-robot navigation in cluttered and constrained environments remains a critical challenge. This is especially difficult with self-interested agents with unique, unknown priorities in decentralized settings, where there is no central authority to resolve conflicts induced by spatial symmetry. We address this challenge by proposing an intuitive, but very effective approach, GameChat, which facilitates safe, agile, and deadlock-free navigation for both cooperative and self-interested agents in cluttered environments. Key to our approach is the idea that agents should resolve conflicts on their own using natural language to communicate, much like humans. We evaluate GameChat in simulated environments with doorways and intersections. The results show that even in the worst case, GameChat reduces the time for all agents to reach their goals by over 35% from a naive baseline and by over 20% from a state of the art baseline in the intersection scenario, while doubling the rate of ensuring the agent with a higher priority task reaches the goal first, from 50% (equivalent to random chance) to 100%. We also demonstrate how GameChat can be extended to more than two agents.