LGAIMay 7

AdaGamma: State-Dependent Discounting for Temporal Adaptation in Reinforcement Learning

arXiv:2605.0614944.8
Predicted impact top 56% in LG · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the limitation of fixed discount factors in deep RL, offering a stable method for state-dependent discounting that improves performance in both simulated and real-world tasks.

AdaGamma introduces a practical deep actor-critic method for state-dependent discounting in RL, using a return-consistency objective to prevent instability. It achieves consistent improvements on continuous-control benchmarks and statistically significant gains in an online A/B test on the JD Logistics platform.

The discount factor in reinforcement learning controls both the effective planning horizon and the strength of bootstrapping, yet most deep RL methods use a single fixed value across all states. While state-dependent discounting is conceptually appealing, naive deep actor--critic implementations can become unstable and degenerate toward TD-error collapse. We propose AdaGamma, a practical deep actor--critic method for state-dependent discounting that learns a state-dependent discount function together with a return-consistency objective to regularize the induced backup structure. On the theory side, we analyze the Bellman operator induced by state-dependent discounting and establish its basic well-posedness properties under suitable conditions. Empirically, AdaGamma integrates into both SAC and PPO, yielding consistent improvements on continuous-control benchmarks, and achieves statistically significant gains in an online A/B test on the JD Logistics platform. These results suggest that state-dependent discounting can be made effective in deep RL when coupled with a return-consistency objective that prevents degenerate target manipulation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes