LGJul 12, 2021

Polynomial Time Reinforcement Learning in Factored State MDPs with Linear Value Functions

Zihao Deng, Siddartha Devic, Brendan Juba

arXiv:2107.05187v27.54 citations

Originality Highly original

AI Analysis

This addresses the challenge of scalability in RL for domains with compactly described state structures, offering a novel approach that relaxes conditional independence assumptions, though it is incremental in advancing prior FMDP work.

The paper tackles the problem of reinforcement learning in factored state MDPs with enormous state spaces by presenting the first polynomial-time algorithm that does not rely on an oracle planner or linear transition model, achieving efficient solution through convex optimization with a separation oracle.

Many reinforcement learning (RL) environments in practice feature enormous state spaces that may be described compactly by a "factored" structure, that may be modeled by Factored Markov Decision Processes (FMDPs). We present the first polynomial-time algorithm for RL in Factored State MDPs (generalizing FMDPs) that neither relies on an oracle planner nor requires a linear transition model; it only requires a linear value function with a suitable local basis with respect to the factorization, permitting efficient variable elimination. With this assumption, we can solve this family of Factored State MDPs in polynomial time by constructing an efficient separation oracle for convex optimization. Importantly, and in contrast to prior work on FMDPs, we do not assume that the transitions on various factors are conditionally independent.

View on arXiv PDF

Similar