NE AIJul 6, 2023

A Neuromorphic Architecture for Reinforcement Learning from Real-Valued Observations

Sergio F. Chevtchenko, Yeshwanth Bethi, Teresa B. Ludermir, Saeed Afshar

arXiv:2307.02947v24.92 citationsh-index: 36

Originality Incremental advance

AI Analysis

This work contributes to the development of more hardware-efficient RL solutions, addressing a domain-specific problem for neuromorphic computing and robotics.

The paper tackles the challenge of implementing reinforcement learning in hardware-efficient and bio-inspired ways by presenting a novel spiking neural network architecture for RL with real-valued observations, which outperforms a tabular actor-critic benchmark and discovers stable control policies on classic environments like mountain car, cart-pole, and acrobot.

Reinforcement Learning (RL) provides a powerful framework for decision-making in complex environments. However, implementing RL in hardware-efficient and bio-inspired ways remains a challenge. This paper presents a novel Spiking Neural Network (SNN) architecture for solving RL problems with real-valued observations. The proposed model incorporates multi-layered event-based clustering, with the addition of Temporal Difference (TD)-error modulation and eligibility traces, building upon prior work. An ablation study confirms the significant impact of these components on the proposed model's performance. A tabular actor-critic algorithm with eligibility traces and a state-of-the-art Proximal Policy Optimization (PPO) algorithm are used as benchmarks. Our network consistently outperforms the tabular approach and successfully discovers stable control policies on classic RL environments: mountain car, cart-pole, and acrobot. The proposed model offers an appealing trade-off in terms of computational and hardware implementation requirements. The model does not require an external memory buffer nor a global error gradient computation, and synaptic updates occur online, driven by local learning rules and a broadcasted TD-error signal. Thus, this work contributes to the development of more hardware-efficient RL solutions.

View on arXiv PDF

Similar