LG MLAug 31, 2020

Efficient Reinforcement Learning in Factored MDPs with Application to Constrained RL

Xiaoyu Chen, Jiachen Hu, Lihong Li, Liwei Wang

arXiv:2008.13319v312.021 citations

Originality Highly original

AI Analysis

This work addresses sample efficiency in complex, structured environments for reinforcement learning practitioners, though it is incremental as it builds on existing FMDP methods.

The paper tackles efficient reinforcement learning in factored MDPs by proposing the FMDP-BF algorithm, which achieves regret exponentially smaller than non-factored MDP algorithms and improves on prior FMDP results by a factor of sqrt(H|S_i|), and applies it to constrained RL with knapsack constraints to provide the first sample-efficient algorithm.

Reinforcement learning (RL) in episodic, factored Markov decision processes (FMDPs) is studied. We propose an algorithm called FMDP-BF, which leverages the factorization structure of FMDP. The regret of FMDP-BF is shown to be exponentially smaller than that of optimal algorithms designed for non-factored MDPs, and improves on the best previous result for FMDPs~\citep{osband2014near} by a factored of $\sqrt{H|\mathcal{S}_i|}$, where $|\mathcal{S}_i|$ is the cardinality of the factored state subspace and $H$ is the planning horizon. To show the optimality of our bounds, we also provide a lower bound for FMDP, which indicates that our algorithm is near-optimal w.r.t. timestep $T$, horizon $H$ and factored state-action subspace cardinality. Finally, as an application, we study a new formulation of constrained RL, known as RL with knapsack constraints (RLwK), and provides the first sample-efficient algorithm based on FMDP-BF.

View on arXiv PDF

Similar