LGAIOct 14, 2022

A Scalable Finite Difference Method for Deep Reinforcement Learning

arXiv:2210.07487v2h-index: 3
AI Analysis

This work addresses scalability issues for researchers and practitioners using distributed black-box optimization in reinforcement learning, but it is incremental as it builds on existing finite difference methods.

The paper tackled the problem of idle time and wasted computation in distributed finite difference methods for deep reinforcement learning by introducing a novel method to reuse older data, resulting in a scalable algorithm that avoids significant inefficiencies.

Several low-bandwidth distributable black-box optimization algorithms in the family of finite differences such as Evolution Strategies have recently been shown to perform nearly as well as tailored Reinforcement Learning methods in some Reinforcement Learning domains. One shortcoming of these black-box methods is that they must collect information about the structure of the return function at every update, and can often employ only information drawn from a distribution centered around the current parameters. As a result, when these algorithms are distributed across many machines, a significant portion of total runtime may be spent with many machines idle, waiting for a final return and then for an update to be calculated. In this work we introduce a novel method to use older data in finite difference algorithms, which produces a scalable algorithm that avoids significant idle time or wasted computation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes