MLLGNov 17, 2024

An Investigation of Offline Reinforcement Learning in Factorisable Action Spaces

arXiv:2411.11088v14 citationsh-index: 3Has CodeTrans. Mach. Learn. Res.
Originality Synthesis-oriented
AI Analysis

This work addresses a gap in offline RL for factorisable action spaces, which could benefit domains with data collection challenges, but it is incremental as it builds on prior methods without major breakthroughs.

The paper tackled the problem of offline reinforcement learning in factorisable action spaces, which are common in real-world applications but understudied, by adapting existing offline techniques and introducing new datasets for evaluation, though no specific performance numbers are provided.

Expanding reinforcement learning (RL) to offline domains generates promising prospects, particularly in sectors where data collection poses substantial challenges or risks. Pivotal to the success of transferring RL offline is mitigating overestimation bias in value estimates for state-action pairs absent from data. Whilst numerous approaches have been proposed in recent years, these tend to focus primarily on continuous or small-scale discrete action spaces. Factorised discrete action spaces, on the other hand, have received relatively little attention, despite many real-world problems naturally having factorisable actions. In this work, we undertake a formative investigation into offline reinforcement learning in factorisable action spaces. Using value-decomposition as formulated in DecQN as a foundation, we present the case for a factorised approach and conduct an extensive empirical evaluation of several offline techniques adapted to the factorised setting. In the absence of established benchmarks, we introduce a suite of our own comprising datasets of varying quality and task complexity. Advocating for reproducible research and innovation, we make all datasets available for public use alongside our code base.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes