Dynamic one-time delivery of critical data by small and sparse UAV swarms: a model problem for MARL scaling studies
This addresses a domain-specific problem for UAV swarm control, but it is incremental as it focuses on scaling studies rather than novel applications.
The study tackled the problem of using Multi-Agent Reinforcement Learning (MARL) for decentralized control of UAV swarms to deliver critical data, finding that off-the-shelf MARL algorithms performed competitively with a baseline for small agent numbers but faced scalability issues as agents increased.
This work presents a conceptual study on the application of Multi-Agent Reinforcement Learning (MARL) for decentralized control of unmanned aerial vehicles to relay a critical data package to a known position. For this purpose, a family of deterministic games is introduced, designed for scaling studies for MARL. A robust baseline policy is proposed, which is based on restricting agent motion envelopes and applying Dijkstra's algorithm. Experimental results show that two off-the-shelf MARL algorithms perform competitively with the baseline for a small number of agents, but scalability issues arise as the number of agents increase.