LGAIMASYJul 17, 2023

Non-Stationary Policy Learning for Multi-Timescale Multi-Agent Reinforcement Learning

arXiv:2307.08794v13 citationsh-index: 19
Originality Incremental advance
AI Analysis

This addresses control problems in real-world complex systems with multiple timescales, but it is incremental as it builds on existing MARL methods with a specific adaptation.

The paper tackles the challenge of learning non-stationary policies in multi-timescale multi-agent reinforcement learning, introducing a simple framework that uses periodic time encoding and phase-functioned neural networks, validated on gridworld and building energy management environments.

In multi-timescale multi-agent reinforcement learning (MARL), agents interact across different timescales. In general, policies for time-dependent behaviors, such as those induced by multiple timescales, are non-stationary. Learning non-stationary policies is challenging and typically requires sophisticated or inefficient algorithms. Motivated by the prevalence of this control problem in real-world complex systems, we introduce a simple framework for learning non-stationary policies for multi-timescale MARL. Our approach uses available information about agent timescales to define a periodic time encoding. In detail, we theoretically demonstrate that the effects of non-stationarity introduced by multiple timescales can be learned by a periodic multi-agent policy. To learn such policies, we propose a policy gradient algorithm that parameterizes the actor and critic with phase-functioned neural networks, which provide an inductive bias for periodicity. The framework's ability to effectively learn multi-timescale policies is validated on a gridworld and building energy management environment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes