LG MLAug 1, 2022

Model-based graph reinforcement learning for inductive traffic signal control

François-Xavier Devailly, Denis Larocque, Laurent Charlin

arXiv:2208.00659v14.612 citationsh-index: 24

Originality Incremental advance

AI Analysis

This addresses the problem of inefficient and non-scalable traffic signal control for urban planners and transportation systems, offering a more flexible and transferable solution, though it builds incrementally on prior transferable methods.

The paper tackles the lack of transferability in reinforcement learning for adaptive traffic signal control by introducing MuJAM, a model-based method that enables explicit coordination and generalization to unseen road networks and traffic settings, outperforming baselines in zero-shot transfer experiments involving up to 3,971 controllers.

Most reinforcement learning methods for adaptive-traffic-signal-control require training from scratch to be applied on any new intersection or after any modification to the road network, traffic distribution, or behavioral constraints experienced during training. Considering 1) the massive amount of experience required to train such methods, and 2) that experience must be gathered by interacting in an exploratory fashion with real road-network-users, such a lack of transferability limits experimentation and applicability. Recent approaches enable learning policies that generalize for unseen road-network topologies and traffic distributions, partially tackling this challenge. However, the literature remains divided between the learning of cyclic (the evolution of connectivity at an intersection must respect a cycle) and acyclic (less constrained) policies, and these transferable methods 1) are only compatible with cyclic constraints and 2) do not enable coordination. We introduce a new model-based method, MuJAM, which, on top of enabling explicit coordination at scale for the first time, pushes generalization further by allowing a generalization to the controllers' constraints. In a zero-shot transfer setting involving both road networks and traffic settings never experienced during training, and in a larger transfer experiment involving the control of 3,971 traffic signal controllers in Manhattan, we show that MuJAM, using both cyclic and acyclic constraints, outperforms domain-specific baselines as well as another transferable approach.

View on arXiv PDF

Similar