LG MAJun 20, 2023

Cooperative Multi-Agent Learning for Navigation via Structured State Abstraction

Mohamed K. Abdelaziz, Mohammed S. Elbamby, Sumudu Samarakoon, Mehdi Bennis

arXiv:2306.11336v23.88 citationsh-index: 83

Originality Incremental advance

AI Analysis

This work addresses the challenge of high state space complexity in cooperative multi-agent navigation, offering an incremental improvement over existing methods.

The paper tackles the complexity of learning navigation policies and communication protocols in multi-agent reinforcement learning by proposing a neural network architecture for adaptive state space abstraction, which reduces state space size without performance loss and results in better policies with fewer training iterations.

Cooperative multi-agent reinforcement learning (MARL) for navigation enables agents to cooperate to achieve their navigation goals. Using emergent communication, agents learn a communication protocol to coordinate and share information that is needed to achieve their navigation tasks. In emergent communication, symbols with no pre-specified usage rules are exchanged, in which the meaning and syntax emerge through training. Learning a navigation policy along with a communication protocol in a MARL environment is highly complex due to the huge state space to be explored. To cope with this complexity, this work proposes a novel neural network architecture, for jointly learning an adaptive state space abstraction and a communication protocol among agents participating in navigation tasks. The goal is to come up with an adaptive abstractor that significantly reduces the size of the state space to be explored, without degradation in the policy performance. Simulation results show that the proposed method reaches a better policy, in terms of achievable rewards, resulting in fewer training iterations compared to the case where raw states or fixed state abstraction are used. Moreover, it is shown that a communication protocol emerges during training which enables the agents to learn better policies within fewer training iterations.

View on arXiv PDF

Similar