Towards Batch-to-Streaming Deep Reinforcement Learning for Continuous Control
This work addresses the need for computationally efficient RL algorithms suitable for on-device finetuning, such as Sim2Real transfer, by enabling streaming updates compatible with batch methods.
The authors propose two streaming deep RL algorithms, S2AC and SDAC, that achieve performance comparable to state-of-the-art streaming baselines on continuous control benchmarks without per-environment hyperparameter tuning, and introduce a principled method for preserving pre-trained policy performance during batch-to-streaming transition.
State-of-the-art deep reinforcement learning (RL) methods have achieved remarkable performance in continuous control tasks, yet their computational complexity is often incompatible with the constraints of resource-limited hardware, due to their reliance on replay buffers, batch updates, and target networks. The emerging paradigm of streaming deep RL addresses this limitation through purely online updates, achieving strong empirical performance on standard benchmarks. In this work, we propose two novel streaming deep RL algorithms, Streaming Soft Actor-Critic (S2AC) and Streaming Deterministic Actor-Critic (SDAC), explicitly designed to be compatible with state-of-the-art batch RL methods, making them particularly suitable for on-device finetuning applications such as Sim2Real transfer. Both algorithms achieve performance comparable to state-of-the-art streaming baselines on standard benchmarks without requiring tedious per-environment hyperparameter tuning. We further investigate the batch-to-streaming transition, showing that a naive transition does not guarantee preservation of pre-trained policy performance, and propose a principled approach to address this challenge.