4.6SYApr 17
Real-Time Solution-Seeking for Game-Theoretic Autonomous Driving via Time-Distributed IterationsShaoqing Liu, Mushuang Liu
Computational complexity has been a major challenge in game-theoretic model predictive control (GT-MPC), as real-time solutions to a game (e.g., Nash equilibria (NEs)) have to be computed at each sampling instant of an MPC. This challenge is especially critical in autonomous driving, where interactions may involve many agents, and decisions must be made at fast sampling rates. We show that this challenge can be addressed through time-distributed solution-seeking iterations designed based on, e.g., Newton and Newton--Kantorovich methods. Specifically, the autonomous vehicle decision-making problem is first formulated as a GT-MPC problem. To ensure solution attainability, a potential game framework is adopted. Within this framework, both potential-function optimization and best-response dynamics are used to seek the NE. To enable real-time implementation, Newton and Newton--Kantorovich methods are employed to solve the optimization problems arising in the NE-seeking algorithms, with their iterations distributed over time. Numerical experiments on an intersection-crossing scenario demonstrate that the proposed methods achieve effective real-time performance.
29.8SYMar 19
Markov Potential Game and Multi-Agent Reinforcement Learning for Autonomous DrivingHuiwen Yan, Mushuang Liu
Autonomous driving (AD) requires safe and reliable decision-making among interacting agents, e.g., vehicles, bicycles, and pedestrians. Multi-agent reinforcement learning (MARL) modeled by Markov games (MGs) provides a suitable framework to characterize such agents' interactions during decision-making. Nash equilibria (NEs) are often the desired solution in an MG. However, it is typically challenging to compute an NE in general-sum games, unless the game is a Markov potential game (MPG), which ensures the NE attainability under a few learning algorithms such as gradient play. However, it has been an open question how to construct an MPG and whether these construction rules are suitable for AD applications. In this paper, we provide sufficient conditions under which an MG is an MPG and show that these conditions can accommodate general driving objectives for autonomous vehicles (AVs) using highway forced merge scenarios as illustrative examples. A parameter-sharing neural network (NN) structure is designed to enable decentralized policy execution. The trained driving policy from MPGs is evaluated in both simulated and naturalistic traffic datasets. Comparative studies with single-agent RL and with human drivers whose behaviors are recorded in the traffic datasets are reported, respectively.
LGSep 30, 2025
Directed-MAML: Meta Reinforcement Learning Algorithm with Task-directed ApproximationYang Zhang, Huiwen Yan, Mushuang Liu
Model-Agnostic Meta-Learning (MAML) is a versatile meta-learning framework applicable to both supervised learning and reinforcement learning (RL). However, applying MAML to meta-reinforcement learning (meta-RL) presents notable challenges. First, MAML relies on second-order gradient computations, leading to significant computational and memory overhead. Second, the nested structure of optimization increases the problem's complexity, making convergence to a global optimum more challenging. To overcome these limitations, we propose Directed-MAML, a novel task-directed meta-RL algorithm. Before the second-order gradient step, Directed-MAML applies an additional first-order task-directed approximation to estimate the effect of second-order gradients, thereby accelerating convergence to the optimum and reducing computational cost. Experimental results demonstrate that Directed-MAML surpasses MAML-based baselines in computational efficiency and convergence speed in the scenarios of CartPole-v1, LunarLander-v2 and two-vehicle intersection crossing. Furthermore, we show that task-directed approximation can be effectively integrated into other meta-learning algorithms, such as First-Order Model-Agnostic Meta-Learning (FOMAML) and Meta Stochastic Gradient Descent(Meta-SGD), yielding improved computational efficiency and convergence speed.