LGJun 8, 2023

Decision S4: Efficient Sequence-Based RL via State Spaces Layers

Meta AI
arXiv:2306.05167v134 citationsh-index: 38
Originality Incremental advance
AI Analysis

This work addresses the challenge of making reinforcement learning more efficient and scalable for real-world applications, though it is incremental in improving upon existing sequence learning methods.

The paper tackles the problem of inefficient sequence-based reinforcement learning by proposing Decision S4, which uses state-space layers to model long-range dependencies more effectively than transformers. The results show that it outperforms decision transformers and other baselines on most tasks, reducing latency, parameters, and training time by several orders of magnitude.

Recently, sequence learning methods have been applied to the problem of off-policy Reinforcement Learning, including the seminal work on Decision Transformers, which employs transformers for this task. Since transformers are parameter-heavy, cannot benefit from history longer than a fixed window size, and are not computed using recurrence, we set out to investigate the suitability of the S4 family of models, which are based on state-space layers and have been shown to outperform transformers, especially in modeling long-range dependencies. In this work we present two main algorithms: (i) an off-policy training procedure that works with trajectories, while still maintaining the training efficiency of the S4 model. (ii) An on-policy training procedure that is trained in a recurrent manner, benefits from long-range dependencies, and is based on a novel stable actor-critic mechanism. Our results indicate that our method outperforms multiple variants of decision transformers, as well as the other baseline methods on most tasks, while reducing the latency, number of parameters, and training time by several orders of magnitude, making our approach more suitable for real-world RL.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes