LGAIOCMLSep 28, 2025

Bridging Discrete and Continuous RL: Stable Deterministic Policy Gradient with Martingale Characterization

arXiv:2509.23711v1h-index: 1
Originality Incremental advance
AI Analysis

This work addresses stability issues in continuous-time RL for control applications, though it is incremental as it builds on existing policy gradient methods.

The paper tackled the challenge of extending discrete-time reinforcement learning algorithms to continuous-time settings, where sensitivity to time discretization causes instability and slow convergence, by proposing CT-DDPG, a deterministic policy gradient method that showed improved stability and faster convergence in control tasks.

The theory of discrete-time reinforcement learning (RL) has advanced rapidly over the past decades. Although primarily designed for discrete environments, many real-world RL applications are inherently continuous and complex. A major challenge in extending discrete-time algorithms to continuous-time settings is their sensitivity to time discretization, often leading to poor stability and slow convergence. In this paper, we investigate deterministic policy gradient methods for continuous-time RL. We derive a continuous-time policy gradient formula based on an analogue of the advantage function and establish its martingale characterization. This theoretical foundation leads to our proposed algorithm, CT-DDPG, which enables stable learning with deterministic policies in continuous-time environments. Numerical experiments show that the proposed CT-DDPG algorithm offers improved stability and faster convergence compared to existing discrete-time and continuous-time methods, across a wide range of control tasks with varying time discretizations and noise levels.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes