LGMay 21, 2025

RLBenchNet: The Right Network for the Right Reinforcement Learning Task

arXiv:2505.15040v12 citationsh-index: 1Has Code
Originality Synthesis-oriented
AI Analysis

This work provides incremental guidance for researchers and practitioners to select appropriate neural network architectures based on task characteristics and computational constraints.

The study systematically evaluated neural network architectures for reinforcement learning tasks, finding that MLPs are efficient for continuous control, recurrent models like LSTM and GRU handle partial observability well, and Mamba models achieve up to 4.5x higher throughput with comparable performance, while Mamba-2 uses 8x less memory than Transformer-XL in memory-intensive tasks.

Reinforcement learning (RL) has seen significant advancements through the application of various neural network architectures. In this study, we systematically investigate the performance of several neural networks in RL tasks, including Long Short-Term Memory (LSTM), Multi-Layer Perceptron (MLP), Mamba/Mamba-2, Transformer-XL, Gated Transformer-XL, and Gated Recurrent Unit (GRU). Through comprehensive evaluation across continuous control, discrete decision-making, and memory-based environments, we identify architecture-specific strengths and limitations. Our results reveal that: (1) MLPs excel in fully observable continuous control tasks, providing an optimal balance of performance and efficiency; (2) recurrent architectures like LSTM and GRU offer robust performance in partially observable environments with moderate memory requirements; (3) Mamba models achieve a 4.5x higher throughput compared to LSTM and a 3.9x increase over GRU, all while maintaining comparable performance; and (4) only Transformer-XL, Gated Transformer-XL, and Mamba-2 successfully solve the most challenging memory-intensive tasks, with Mamba-2 requiring 8x less memory than Transformer-XL. These findings provide insights for researchers and practitioners, enabling more informed architecture selection based on specific task characteristics and computational constraints. Code is available at: https://github.com/SafeRL-Lab/RLBenchNet

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes