LG AI OCDec 23, 2025

Performative Policy Gradient: Optimality in Performative Reinforcement Learning

Debabrota Basu, Udvas Das, Brahim Driss, Uddalak Mukherjee

arXiv:2512.20576v17.11 citationsh-index: 4

Originality Highly original

AI Analysis

This addresses a key challenge in reinforcement learning for applications where algorithms cause distribution shifts, offering a novel solution with proven optimality, though it builds on prior work in performative settings.

The paper tackles the problem of reinforcement learning algorithms influencing their environments post-deployment, which standard methods ignore, by introducing the Performative Policy Gradient algorithm (PePG) that converges to performatively optimal policies, outperforming existing methods in empirical tests.

Post-deployment machine learning algorithms often influence the environments they act in, and thus shift the underlying dynamics that the standard reinforcement learning (RL) methods ignore. While designing optimal algorithms in this performative setting has recently been studied in supervised learning, the RL counterpart remains under-explored. In this paper, we prove the performative counterparts of the performance difference lemma and the policy gradient theorem in RL, and further introduce the Performative Policy Gradient algorithm (PePG). PePG is the first policy gradient algorithm designed to account for performativity in RL. Under softmax parametrisation, and also with and without entropy regularisation, we prove that PePG converges to performatively optimal policies, i.e. policies that remain optimal under the distribution shifts induced by themselves. Thus, PePG significantly extends the prior works in Performative RL that achieves performative stability but not optimality. Furthermore, our empirical analysis on standard performative RL environments validate that PePG outperforms standard policy gradient algorithms and the existing performative RL algorithms aiming for stability.

View on arXiv PDF

Similar