LGMLAug 31, 2020

Efficient Reinforcement Learning in Factored MDPs with Application to Constrained RL

arXiv:2008.13319v321 citations
Originality Highly original
AI Analysis

This work addresses sample efficiency in complex, structured environments for reinforcement learning practitioners, though it is incremental as it builds on existing FMDP methods.

The paper tackles efficient reinforcement learning in factored MDPs by proposing the FMDP-BF algorithm, which achieves regret exponentially smaller than non-factored MDP algorithms and improves on prior FMDP results by a factor of sqrt(H|S_i|), and applies it to constrained RL with knapsack constraints to provide the first sample-efficient algorithm.

Reinforcement learning (RL) in episodic, factored Markov decision processes (FMDPs) is studied. We propose an algorithm called FMDP-BF, which leverages the factorization structure of FMDP. The regret of FMDP-BF is shown to be exponentially smaller than that of optimal algorithms designed for non-factored MDPs, and improves on the best previous result for FMDPs~\citep{osband2014near} by a factored of $\sqrt{H|\mathcal{S}_i|}$, where $|\mathcal{S}_i|$ is the cardinality of the factored state subspace and $H$ is the planning horizon. To show the optimality of our bounds, we also provide a lower bound for FMDP, which indicates that our algorithm is near-optimal w.r.t. timestep $T$, horizon $H$ and factored state-action subspace cardinality. Finally, as an application, we study a new formulation of constrained RL, known as RL with knapsack constraints (RLwK), and provides the first sample-efficient algorithm based on FMDP-BF.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes