LGJun 8, 2023

Decision S4: Efficient Sequence-Based RL via State Spaces Layers

Shmuel Bar-David, Itamar Zimerman, Eliya Nachmani, Lior Wolf

Meta AI

arXiv:2306.05167v121.734 citationsh-index: 38

Originality Incremental advance

AI Analysis

This work addresses the challenge of making reinforcement learning more efficient and scalable for real-world applications, though it is incremental in improving upon existing sequence learning methods.

The paper tackles the problem of inefficient sequence-based reinforcement learning by proposing Decision S4, which uses state-space layers to model long-range dependencies more effectively than transformers. The results show that it outperforms decision transformers and other baselines on most tasks, reducing latency, parameters, and training time by several orders of magnitude.

Recently, sequence learning methods have been applied to the problem of off-policy Reinforcement Learning, including the seminal work on Decision Transformers, which employs transformers for this task. Since transformers are parameter-heavy, cannot benefit from history longer than a fixed window size, and are not computed using recurrence, we set out to investigate the suitability of the S4 family of models, which are based on state-space layers and have been shown to outperform transformers, especially in modeling long-range dependencies. In this work we present two main algorithms: (i) an off-policy training procedure that works with trajectories, while still maintaining the training efficiency of the S4 model. (ii) An on-policy training procedure that is trained in a recurrent manner, benefits from long-range dependencies, and is based on a novel stable actor-critic mechanism. Our results indicate that our method outperforms multiple variants of decision transformers, as well as the other baseline methods on most tasks, while reducing the latency, number of parameters, and training time by several orders of magnitude, making our approach more suitable for real-world RL.

View on arXiv PDF

Similar