Deep Residual Reinforcement Learning
This work improves reinforcement learning stability and performance for robotics and control applications, though it is incremental.
The paper tackles instability in residual reinforcement learning algorithms by proposing a bidirectional target network technique, which yields a residual DDPG version that significantly outperforms vanilla DDPG on the DeepMind Control Suite benchmark. It also addresses distribution mismatch in model-based planning with a residual-based method that makes weaker assumptions and provides greater performance boosts compared to existing TD(k) methods.
We revisit residual algorithms in both model-free and model-based reinforcement learning settings. We propose the bidirectional target network technique to stabilize residual algorithms, yielding a residual version of DDPG that significantly outperforms vanilla DDPG in the DeepMind Control Suite benchmark. Moreover, we find the residual algorithm an effective approach to the distribution mismatch problem in model-based planning. Compared with the existing TD($k$) method, our residual-based method makes weaker assumptions about the model and yields a greater performance boost.