LGOCOct 16, 2025

Policy Transfer for Continuous-Time Reinforcement Learning: A (Rough) Differential Equation Approach

arXiv:2510.15165v2h-index: 1
Originality Incremental advance
AI Analysis

This addresses the problem of efficient policy learning in continuous-time RL for researchers and practitioners, offering incremental theoretical advancements with practical algorithmic benefits.

The paper tackles policy transfer for continuous-time reinforcement learning by providing the first theoretical proof that an optimal policy from one problem can initialize a near-optimal policy in a related problem, maintaining convergence rates, and demonstrates this with a novel algorithm achieving global linear and local super-linear convergence for LQRs.

This paper studies policy transfer, one of the well-known transfer learning techniques adopted in large language models, for two classes of continuous-time reinforcement learning problems. In the first class of continuous-time linear-quadratic systems with Shannon's entropy regularization (a.k.a. LQRs), we fully exploit the Gaussian structure of their optimal policy and the stability of their associated Riccati equations. In the second class where the system has possibly non-linear and bounded dynamics, the key technical component is the stability of diffusion SDEs which is established by invoking the rough path theory. Our work provides the first theoretical proof of policy transfer for continuous-time RL: an optimal policy learned for one RL problem can be used to initialize the search for a near-optimal policy in a closely related RL problem, while maintaining the convergence rate of the original algorithm. To illustrate the benefit of policy transfer for RL, we propose a novel policy learning algorithm for continuous-time LQRs, which achieves global linear convergence and local super-linear convergence. As a byproduct of our analysis, we derive the stability of a concrete class of continuous-time score-based diffusion models via their connection with LQRs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes