Convergence of Multi-Agent Learning with a Finite Step Size in General-Sum Games
This work addresses convergence issues in multi-agent systems for applications like game theory and robotics, representing an incremental improvement over existing gradient-based methods.
The paper tackles the challenge of multi-agent learning convergence in non-stationary environments by introducing GA-SPP, a gradient-based algorithm that uses shrinking policy prediction, and shows it achieves Nash convergence in broader settings than prior methods without requiring an infinitesimal learning rate.
Learning in a multi-agent system is challenging because agents are simultaneously learning and the environment is not stationary, undermining convergence guarantees. To address this challenge, this paper presents a new gradient-based learning algorithm, called Gradient Ascent with Shrinking Policy Prediction (GA-SPP), which augments the basic gradient ascent approach with the concept of shrinking policy prediction. The key idea behind this algorithm is that an agent adjusts its strategy in response to the forecasted strategy of the other agent, instead of its current one. GA-SPP is shown formally to have Nash convergence in larger settings than existing gradient-based multi-agent learning methods. Furthermore, unlike existing gradient-based methods, GA-SPP's theoretical guarantees do not assume the learning rate to be infinitesimal.