RODec 21, 2020

Multi-Agent Reinforcement Learning for Dynamic Ocean Monitoring by a Swarm of Buoys

arXiv:2012.11641v133 citations
AI Analysis

This work provides a structured learning approach for multi-robot systems to improve dynamic ocean monitoring, which is beneficial for environmental scientists and marine researchers. This is an incremental improvement over existing swarming methods.

This paper addresses the problem of dynamic ocean monitoring using a swarm of buoys by proposing two multi-agent reinforcement learning (MARL) approaches for area coverage. The coverage-range-based MARL method, a modified MADDPG, demonstrated superior performance in agent spreading and learning convergence compared to a swarm-based MARL and a naive swarming method.

Autonomous marine environmental monitoring problem traditionally encompasses an area coverage problem which can only be effectively carried out by a multi-robot system. In this paper, we focus on robotic swarms that are typically operated and controlled by means of simple swarming behaviors obtained from a subtle, yet ad hoc combination of bio-inspired strategies. We propose a novel and structured approach for area coverage using multi-agent reinforcement learning (MARL) which effectively deals with the non-stationarity of environmental features. Specifically, we propose two dynamic area coverage approaches: (1) swarm-based MARL, and (2) coverage-range-based MARL. The former is trained using the multi-agent deep deterministic policy gradient (MADDPG) approach whereas, a modified version of MADDPG is introduced for the latter with a reward function that intrinsically leads to a collective behavior. Both methods are tested and validated with different geometric shaped regions with equal surface area (square vs. rectangle) yielding acceptable area coverage, and benefiting from the structured learning in non-stationary environments. Both approaches are advantageous compared to a naïve swarming method. However, coverage-range-based MARL outperforms the swarm-based MARL with stronger convergence features in learning criteria and higher spreading of agents for area coverage.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes