LGGTJun 30, 2022

Performative Reinforcement Learning

arXiv:2207.00046v225 citationsh-index: 16
Originality Incremental advance
AI Analysis

This addresses the challenge of decision-dependent environments in reinforcement learning, which is an incremental extension of performative prediction to RL.

The paper tackles the problem of reinforcement learning where the policy affects the environment's reward and transition dynamics, introducing a performatively stable policy and showing convergence to it under various settings, with experimental validation on a grid-world environment.

We introduce the framework of performative reinforcement learning where the policy chosen by the learner affects the underlying reward and transition dynamics of the environment. Following the recent literature on performative prediction~\cite{Perdomo et. al., 2020}, we introduce the concept of performatively stable policy. We then consider a regularized version of the reinforcement learning problem and show that repeatedly optimizing this objective converges to a performatively stable policy under reasonable assumptions on the transition dynamics. Our proof utilizes the dual perspective of the reinforcement learning problem and may be of independent interest in analyzing the convergence of other algorithms with decision-dependent environments. We then extend our results for the setting where the learner just performs gradient ascent steps instead of fully optimizing the objective, and for the setting where the learner has access to a finite number of trajectories from the changed environment. For both settings, we leverage the dual formulation of performative reinforcement learning and establish convergence to a stable solution. Finally, through extensive experiments on a grid-world environment, we demonstrate the dependence of convergence on various parameters e.g. regularization, smoothness, and the number of samples.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes