Modeling Others using Oneself in Multi-Agent Reinforcement Learning
This addresses the challenge of modeling other agents in multi-agent systems, which is incremental as it builds on existing inference methods.
The paper tackles the problem of multi-agent reinforcement learning with imperfect information, where agents must infer each other's hidden goals to maximize utility, and shows that the proposed Self Other-Modeling approach enables agents to learn better policies in cooperative and adversarial tasks.
We consider the multi-agent reinforcement learning setting with imperfect information in which each agent is trying to maximize its own utility. The reward function depends on the hidden state (or goal) of both agents, so the agents must infer the other players' hidden goals from their observed behavior in order to solve the tasks. We propose a new approach for learning in these domains: Self Other-Modeling (SOM), in which an agent uses its own policy to predict the other agent's actions and update its belief of their hidden state in an online manner. We evaluate this approach on three different tasks and show that the agents are able to learn better policies using their estimate of the other players' hidden states, in both cooperative and adversarial settings.