Risk-Sensitive Multi-Agent Reinforcement Learning in Network Aggregative Markov Games
This addresses risk-sensitive decision-making for agents in MARL settings involving human-like preferences, but it is incremental as it extends existing methods to a specific game type.
The paper tackles the problem of incorporating risk-sensitive preferences into multi-agent reinforcement learning (MARL) by using cumulative prospect theory (CPT) in network aggregative Markov games, resulting in a distributed actor-critic algorithm that converges to a subjective Nash equilibrium and shows agents with higher loss aversion tend to socially isolate.
Classical multi-agent reinforcement learning (MARL) assumes risk neutrality and complete objectivity for agents. However, in settings where agents need to consider or model human economic or social preferences, a notion of risk must be incorporated into the RL optimization problem. This will be of greater importance in MARL where other human or non-human agents are involved, possibly with their own risk-sensitive policies. In this work, we consider risk-sensitive and non-cooperative MARL with cumulative prospect theory (CPT), a non-convex risk measure and a generalization of coherent measures of risk. CPT is capable of explaining loss aversion in humans and their tendency to overestimate/underestimate small/large probabilities. We propose a distributed sampling-based actor-critic (AC) algorithm with CPT risk for network aggregative Markov games (NAMGs), which we call Distributed Nested CPT-AC. Under a set of assumptions, we prove the convergence of the algorithm to a subjective notion of Markov perfect Nash equilibrium in NAMGs. The experimental results show that subjective CPT policies obtained by our algorithm can be different from the risk-neutral ones, and agents with a higher loss aversion are more inclined to socially isolate themselves in an NAMG.