LGMASPSYFeb 8, 2023

Policy Evaluation in Decentralized POMDPs with Belief Sharing

arXiv:2302.04151v29 citationsh-index: 87
Originality Incremental advance
AI Analysis

This work addresses the challenge of multi-agent reinforcement learning in partially observable environments, which is relevant for applications like sensor networks, but it is incremental as it builds on existing belief-sharing methods.

The paper tackles the problem of cooperative policy evaluation in decentralized partially observable Markov decision processes (POMDPs) by proposing a fully decentralized belief-sharing strategy that includes exchanging value function parameters over a communication network, and it analytically shows that this strategy allows agents' parameters to have a bounded difference from a centralized baseline.

Most works on multi-agent reinforcement learning focus on scenarios where the state of the environment is fully observable. In this work, we consider a cooperative policy evaluation task in which agents are not assumed to observe the environment state directly. Instead, agents can only have access to noisy observations and to belief vectors. It is well-known that finding global posterior distributions under multi-agent settings is generally NP-hard. As a remedy, we propose a fully decentralized belief forming strategy that relies on individual updates and on localized interactions over a communication network. In addition to the exchange of the beliefs, agents exploit the communication network by exchanging value function parameter estimates as well. We analytically show that the proposed strategy allows information to diffuse over the network, which in turn allows the agents' parameters to have a bounded difference with a centralized baseline. A multi-sensor target tracking application is considered in the simulations.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes