Sequence Pathfinder for Multi-Agent Pickup and Delivery in the Warehouse
This work addresses the problem of efficient multi-agent task completion in warehouse logistics, offering a novel method that improves performance and scalability, though it is incremental in building on sequence modeling and Transformer paradigms.
The paper tackles the challenge of Multi-Agent Pickup and Delivery (MAPD) in warehouse environments with narrow pathways, where existing learning-based methods perform poorly due to reliance on local observations or high computational complexity from communication. It proposes the Sequential Pathfinder (SePar), which uses a Transformer-based approach to reduce decision-making complexity from exponential to linear, outperforming existing methods and generalizing to unseen environments.
Multi-Agent Pickup and Delivery (MAPD) is a challenging extension of Multi-Agent Path Finding (MAPF), where agents are required to sequentially complete tasks with fixed-location pickup and delivery demands. Although learning-based methods have made progress in MAPD, they often perform poorly in warehouse-like environments with narrow pathways and long corridors when relying only on local observations for distributed decision-making. Communication learning can alleviate the lack of global information but introduce high computational complexity due to point-to-point communication. To address this challenge, we formulate MAPF as a sequence modeling problem and prove that path-finding policies under sequence modeling possess order-invariant optimality, ensuring its effectiveness in MAPD. Building on this, we propose the Sequential Pathfinder (SePar), which leverages the Transformer paradigm to achieve implicit information exchange, reducing decision-making complexity from exponential to linear while maintaining efficiency and global awareness. Experiments demonstrate that SePar consistently outperforms existing learning-based methods across various MAPF tasks and their variants, and generalizes well to unseen environments. Furthermore, we highlight the necessity of integrating imitation learning in complex maps like warehouses.