MA AI ET LGJan 23, 2025

BMG-Q: Localized Bipartite Match Graph Attention Q-Learning for Ride-Pooling Order Dispatch

arXiv:2501.13448v15.15 citationsh-index: 3IEEE transactions on intelligent transportation systems (Print)

Originality Incremental advance

AI Analysis

It addresses ride-pooling dispatch for large fleets, offering incremental improvements in scalability and robustness.

This paper tackled the problem of ride-pooling order dispatch by introducing BMG-Q, a Multi-Agent Reinforcement Learning framework that uses a localized bipartite match graph and Graph Attention Double Deep Q Network, resulting in a 10% improvement in accumulative rewards and over 50% reduction in overestimation bias compared to benchmarks.

This paper introduces Localized Bipartite Match Graph Attention Q-Learning (BMG-Q), a novel Multi-Agent Reinforcement Learning (MARL) algorithm framework tailored for ride-pooling order dispatch. BMG-Q advances ride-pooling decision-making process with the localized bipartite match graph underlying the Markov Decision Process, enabling the development of novel Graph Attention Double Deep Q Network (GATDDQN) as the MARL backbone to capture the dynamic interactions among ride-pooling vehicles in fleet. Our approach enriches the state information for each agent with GATDDQN by leveraging a localized bipartite interdependence graph and enables a centralized global coordinator to optimize order matching and agent behavior using Integer Linear Programming (ILP). Enhanced by gradient clipping and localized graph sampling, our GATDDQN improves scalability and robustness. Furthermore, the inclusion of a posterior score function in the ILP captures the online exploration-exploitation trade-off and reduces the potential overestimation bias of agents, thereby elevating the quality of the derived solutions. Through extensive experiments and validation, BMG-Q has demonstrated superior performance in both training and operations for thousands of vehicle agents, outperforming benchmark reinforcement learning frameworks by around 10% in accumulative rewards and showing a significant reduction in overestimation bias by over 50%. Additionally, it maintains robustness amidst task variations and fleet size changes, establishing BMG-Q as an effective, scalable, and robust framework for advancing ride-pooling order dispatch operations.

View on arXiv PDF

Similar