LGAIJun 9, 2023

Decision Stacks: Flexible Reinforcement Learning via Modular Generative Models

arXiv:2306.06253v29 citationsh-index: 38
AI Analysis

This work addresses the problem of balancing expressivity and flexibility in reinforcement learning for researchers and practitioners, though it appears incremental as it builds on modular generative models.

The authors tackled the challenge of integrating multiple aspects of sequential decision making in reinforcement learning by proposing Decision Stacks, a generative framework that decomposes agents into three modules for observations, rewards, and actions, which outperformed existing methods in offline policy optimization for MDP and POMDP environments.

Reinforcement learning presents an attractive paradigm to reason about several distinct aspects of sequential decision making, such as specifying complex goals, planning future observations and actions, and critiquing their utilities. However, the combined integration of these capabilities poses competing algorithmic challenges in retaining maximal expressivity while allowing for flexibility in modeling choices for efficient learning and inference. We present Decision Stacks, a generative framework that decomposes goal-conditioned policy agents into 3 generative modules. These modules simulate the temporal evolution of observations, rewards, and actions via independent generative models that can be learned in parallel via teacher forcing. Our framework guarantees both expressivity and flexibility in designing individual modules to account for key factors such as architectural bias, optimization objective and dynamics, transferrability across domains, and inference speed. Our empirical results demonstrate the effectiveness of Decision Stacks for offline policy optimization for several MDP and POMDP environments, outperforming existing methods and enabling flexible generative decision making.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes