LGOct 28, 2024

BraVE: Offline Reinforcement Learning for Discrete Combinatorial Action Spaces

arXiv:2410.21151v25 citationsh-index: 15
Originality Highly original
AI Analysis

This addresses a computational bottleneck for researchers and practitioners in offline RL dealing with large combinatorial action spaces, representing a strong specific gain rather than an incremental improvement.

The paper tackles the challenge of offline reinforcement learning in high-dimensional discrete combinatorial action spaces, where existing methods are computationally infeasible or fail to model sub-action dependencies, and proposes Branch Value Estimation (BraVE), which outperforms prior methods by up to 20x in environments with over four million actions.

Offline reinforcement learning in high-dimensional, discrete action spaces is challenging due to the exponential scaling of the joint action space with the number of sub-actions and the complexity of modeling sub-action dependencies. Existing methods either exhaustively evaluate the action space, making them computationally infeasible, or factorize Q-values, failing to represent joint sub-action effects. We propose Branch Value Estimation (BraVE), a value-based method that uses tree-structured action traversal to evaluate a linear number of joint actions while preserving dependency structure. BraVE outperforms prior offline RL methods by up to $20\times$ in environments with over four million actions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes