Overcoming the Spectral Bias of Neural Value Approximation
This addresses a key bottleneck in off-policy deep reinforcement learning for robotics and control systems, offering an incremental improvement over existing methods.
The paper tackles the spectral bias in neural value approximation by proposing Fourier feature networks, which achieve state-of-the-art performance on continuous control tasks with reduced compute and improved stability.
Value approximation using deep neural networks is at the heart of off-policy deep reinforcement learning, and is often the primary module that provides learning signals to the rest of the algorithm. While multi-layer perceptron networks are universal function approximators, recent works in neural kernel regression suggest the presence of a spectral bias, where fitting high-frequency components of the value function requires exponentially more gradient update steps than the low-frequency ones. In this work, we re-examine off-policy reinforcement learning through the lens of kernel regression and propose to overcome such bias via a composite neural tangent kernel. With just a single line-change, our approach, the Fourier feature networks (FFN) produce state-of-the-art performance on challenging continuous control domains with only a fraction of the compute. Faster convergence and better off-policy stability also make it possible to remove the target network without suffering catastrophic divergences, which further reduces TD}(0)'s estimation bias on a few tasks.