LGAIOct 14, 2025

Finite-time Convergence Analysis of Actor-Critic with Evolving Reward

arXiv:2510.12334v1h-index: 3
Originality Incremental advance
AI Analysis

This work addresses a theoretical gap for practitioners using evolving reward techniques in reinforcement learning, though it is incremental as it extends existing analysis to a more general setting.

The paper tackles the lack of theoretical foundations for reinforcement learning algorithms with evolving reward functions, such as reward shaping or entropy regularization, by providing the first finite-time convergence analysis of a single-timescale actor-critic algorithm under Markovian sampling, achieving an O(1/√T) convergence rate that matches the best-known rate for static rewards when reward parameters evolve slowly.

Many popular practical reinforcement learning (RL) algorithms employ evolving reward functions-through techniques such as reward shaping, entropy regularization, or curriculum learning-yet their theoretical foundations remain underdeveloped. This paper provides the first finite-time convergence analysis of a single-timescale actor-critic algorithm in the presence of an evolving reward function under Markovian sampling. We consider a setting where the reward parameters may change at each time step, affecting both policy optimization and value estimation. Under standard assumptions, we derive non-asymptotic bounds for both actor and critic errors. Our result shows that an $O(1/\sqrt{T})$ convergence rate is achievable, matching the best-known rate for static rewards, provided the reward parameters evolve slowly enough. This rate is preserved when the reward is updated via a gradient-based rule with bounded gradient and on the same timescale as the actor and critic, offering a theoretical foundation for many popular RL techniques. As a secondary contribution, we introduce a novel analysis of distribution mismatch under Markovian sampling, improving the best-known rate by a factor of $\log^2T$ in the static-reward case.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes