LG MLSep 13, 2020

Oracle-Efficient Regret Minimization in Factored MDPs with Unknown Structure

arXiv:2009.05986v47.913 citationsHas Code

Originality Highly original

AI Analysis

This addresses the problem of reinforcement learning in complex environments for researchers and practitioners, offering a novel approach but with incremental improvements in regret bounds.

The paper tackles regret minimization in factored MDPs with unknown structure, providing the first algorithm that learns the structure while minimizing regret, achieving efficient implementation with oracle access to planners and proving a novel lower bound for the known structure case.

We study regret minimization in non-episodic factored Markov decision processes (FMDPs), where all existing algorithms make the strong assumption that the factored structure of the FMDP is known to the learner in advance. In this paper, we provide the first algorithm that learns the structure of the FMDP while minimizing the regret. Our algorithm is based on the optimism in face of uncertainty principle, combined with a simple statistical method for structure learning, and can be implemented efficiently given oracle-access to an FMDP planner. Moreover, we give a variant of our algorithm that remains efficient even when the oracle is limited to non-factored actions, which is the case with almost all existing approximate planners. Finally, we leverage our techniques to prove a novel lower bound for the known structure case, closing the gap to the regret bound of Chen et al. [2021].

View on arXiv PDF Code

Similar