LGAIMay 20, 2025

Sample and Computationally Efficient Continuous-Time Reinforcement Learning with General Function Approximation

arXiv:2505.14821v11 citationsh-index: 3Has CodeUAI
Originality Highly original
AI Analysis

This addresses the theoretical gap in continuous-time RL for researchers, providing the first sample complexity guarantee with general function approximation, though it appears incremental in building on existing optimism-based methods.

The paper tackles the problem of limited theoretical understanding in continuous-time reinforcement learning with general function approximation by proposing a model-based algorithm that achieves both sample and computational efficiency, showing a near-optimal policy can be learned with a suboptimality gap of Õ(√(d_R + d_F)N^{-1/2}) using N measurements.

Continuous-time reinforcement learning (CTRL) provides a principled framework for sequential decision-making in environments where interactions evolve continuously over time. Despite its empirical success, the theoretical understanding of CTRL remains limited, especially in settings with general function approximation. In this work, we propose a model-based CTRL algorithm that achieves both sample and computational efficiency. Our approach leverages optimism-based confidence sets to establish the first sample complexity guarantee for CTRL with general function approximation, showing that a near-optimal policy can be learned with a suboptimality gap of $\tilde{O}(\sqrt{d_{\mathcal{R}} + d_{\mathcal{F}}}N^{-1/2})$ using $N$ measurements, where $d_{\mathcal{R}}$ and $d_{\mathcal{F}}$ denote the distributional Eluder dimensions of the reward and dynamic functions, respectively, capturing the complexity of general function approximation in reinforcement learning. Moreover, we introduce structured policy updates and an alternative measurement strategy that significantly reduce the number of policy updates and rollouts while maintaining competitive sample efficiency. We implemented experiments to backup our proposed algorithms on continuous control tasks and diffusion model fine-tuning, demonstrating comparable performance with significantly fewer policy updates and rollouts.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes