LGJun 24, 2021

A Fully Problem-Dependent Regret Lower Bound for Finite-Horizon MDPs

arXiv:2106.13013v116 citations
Originality Incremental advance
AI Analysis

This work addresses a foundational theoretical gap in reinforcement learning by refining regret analysis for MDPs, though it is incremental as it builds on prior lower-bound derivations.

The paper tackles the problem of deriving a problem-dependent regret lower bound for finite-horizon tabular Markov Decision Processes (MDPs), revealing that an additional constraint based on MDP dynamics leads to varying complexity across different MDP instances, with results showing larger lower bounds in difficult cases and smaller ones in simple cases, and provides an attainable upper bound up to poly(H) terms.

We derive a novel asymptotic problem-dependent lower-bound for regret minimization in finite-horizon tabular Markov Decision Processes (MDPs). While, similar to prior work (e.g., for ergodic MDPs), the lower-bound is the solution to an optimization problem, our derivation reveals the need for an additional constraint on the visitation distribution over state-action pairs that explicitly accounts for the dynamics of the MDP. We provide a characterization of our lower-bound through a series of examples illustrating how different MDPs may have significantly different complexity. 1) We first consider a "difficult" MDP instance, where the novel constraint based on the dynamics leads to a larger lower-bound (i.e., a larger regret) compared to the classical analysis. 2) We then show that our lower-bound recovers results previously derived for specific MDP instances. 3) Finally, we show that, in certain "simple" MDPs, the lower bound is considerably smaller than in the general case and it does not scale with the minimum action gap at all. We show that this last result is attainable (up to $poly(H)$ terms, where $H$ is the horizon) by providing a regret upper-bound based on policy gaps for an optimistic algorithm.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes