ARETLGJul 16, 2022

Associative Memory Based Experience Replay for Deep Reinforcement Learning

arXiv:2207.07791v110 citationsh-index: 16
Originality Incremental advance
AI Analysis

This addresses a performance bottleneck for DRL practitioners by reducing latency in experience replay, though it is an incremental improvement focused on hardware optimization.

The paper tackled the latency overhead of prioritized experience replay (PER) in deep reinforcement learning by proposing AMPER, a hardware-software co-design using associative memory, which achieved comparable learning performance with 55x to 270x latency improvement over state-of-the-art PER on GPU.

Experience replay is an essential component in deep reinforcement learning (DRL), which stores the experiences and generates experiences for the agent to learn in real time. Recently, prioritized experience replay (PER) has been proven to be powerful and widely deployed in DRL agents. However, implementing PER on traditional CPU or GPU architectures incurs significant latency overhead due to its frequent and irregular memory accesses. This paper proposes a hardware-software co-design approach to design an associative memory (AM) based PER, AMPER, with an AM-friendly priority sampling operation. AMPER replaces the widely-used time-costly tree-traversal-based priority sampling in PER while preserving the learning performance. Further, we design an in-memory computing hardware architecture based on AM to support AMPER by leveraging parallel in-memory search operations. AMPER shows comparable learning performance while achieving 55x to 270x latency improvement when running on the proposed hardware compared to the state-of-the-art PER running on GPU.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes