Deception in Social Learning: A Multi-Agent Reinforcement Learning Perspective
This addresses a critical safety issue in AI systems for researchers and practitioners, but it is incremental as it reviews rather than solves the problem.
The paper tackles the problem of deception in social learning within multi-agent reinforcement learning, where agents can manipulate others' reward functions, potentially harming their interests, and it reviews existing evidence and open problems without presenting new results.
Within the framework of Multi-Agent Reinforcement Learning, Social Learning is a new class of algorithms that enables agents to reshape the reward function of other agents with the goal of promoting cooperation and achieving higher global rewards in mixed-motive games. However, this new modification allows agents unprecedented access to each other's learning process, which can drastically increase the risk of manipulation when an agent does not realize it is being deceived into adopting policies which are not actually in its own best interest. This research review introduces the problem statement, defines key concepts, critically evaluates existing evidence and addresses open problems that should be addressed in future research.