LGAIROJul 26, 2022

Offline Reinforcement Learning at Multiple Frequencies

arXiv:2207.13082v16 citationsh-index: 93
Originality Incremental advance
AI Analysis

This addresses a practical issue for robotics researchers dealing with mixed-frequency offline data, but it is an incremental improvement over existing methods.

The paper tackles the problem of offline reinforcement learning from heterogeneous data collected at different control frequencies, and shows that a simple method scaling N-step returns with discretization size improves performance by 50% on average in simulated robotic control tasks.

Leveraging many sources of offline robot data requires grappling with the heterogeneity of such data. In this paper, we focus on one particular aspect of heterogeneity: learning from offline data collected at different control frequencies. Across labs, the discretization of controllers, sampling rates of sensors, and demands of a task of interest may differ, giving rise to a mixture of frequencies in an aggregated dataset. We study how well offline reinforcement learning (RL) algorithms can accommodate data with a mixture of frequencies during training. We observe that the $Q$-value propagates at different rates for different discretizations, leading to a number of learning challenges for off-the-shelf offline RL. We present a simple yet effective solution that enforces consistency in the rate of $Q$-value updates to stabilize learning. By scaling the value of $N$ in $N$-step returns with the discretization size, we effectively balance $Q$-value propagation, leading to more stable convergence. On three simulated robotic control problems, we empirically find that this simple approach outperforms naïve mixing by 50% on average.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes