DC AI MAMay 24, 2023

Distributed Online Rollout for Multivehicle Routing in Unmapped Environments

Jamison W. Weber, Dhanush R. Giriyan, Devendra R. Parkar, Dimitri P. Bertsekas, Andréa W. Richa

arXiv:2305.15596v32.32 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses multiagent routing in real-world applications with strict local communication and sensing constraints, representing an incremental advance over centralized approaches.

The paper tackles the multivehicle routing problem in unmapped environments without centralized control, proposing a distributed online reinforcement learning algorithm where agents self-organize into local clusters. The algorithm achieves approximately a factor of two cost improvement over a greedy base policy for specific sensing radii.

In this work we consider a generalization of the well-known multivehicle routing problem: given a network, a set of agents occupying a subset of its nodes, and a set of tasks, we seek a minimum cost sequence of movements subject to the constraint that each task is visited by some agent at least once. The classical version of this problem assumes a central computational server that observes the entire state of the system perfectly and directs individual agents according to a centralized control scheme. In contrast, we assume that there is no centralized server and that each agent is an individual processor with no a priori knowledge of the underlying network (including task and agent locations). Moreover, our agents possess strictly local communication and sensing capabilities (restricted to a fixed radius around their respective locations), aligning more closely with several real-world multiagent applications. These restrictions introduce many challenges that are overcome through local information sharing and direct coordination between agents. We present a fully distributed, online, and scalable reinforcement learning algorithm for this problem whereby agents self-organize into local clusters and independently apply a multiagent rollout scheme locally to each cluster. We demonstrate empirically via extensive simulations that there exists a critical sensing radius beyond which the distributed rollout algorithm begins to improve over a greedy base policy. This critical sensing radius grows proportionally to the $\log^*$ function of the size of the network, and is, therefore, a small constant for any relevant network. Our decentralized reinforcement learning algorithm achieves approximately a factor of two cost improvement over the base policy for a range of radii bounded from below and above by two and three times the critical sensing radius, respectively.

View on arXiv PDF Code

Similar