LGMLAug 3, 2023

Minimax Optimal Q Learning with Nearest Neighbors

arXiv:2308.01490v216 citationsh-index: 8
Originality Incremental advance
AI Analysis

This provides more efficient reinforcement learning algorithms for continuous state spaces, with incremental improvements in sample and computational complexity.

The paper tackles the problem of Q-learning in Markov decision processes with continuous state spaces by proposing new nearest neighbor methods for offline and online settings, achieving sample complexities of $ ilde{O}( rac{1}{\epsilon^{d+2}(1-\gamma)^{d+2}})$ and $ ilde{O}( rac{1}{\epsilon^{d+2}(1-\gamma)^{d+3}})$ respectively, which significantly improve over prior work and are minimax optimal in $\epsilon$.

Analyzing the Markov decision process (MDP) with continuous state spaces is generally challenging. A recent interesting work \cite{shah2018q} solves MDP with bounded continuous state space by a nearest neighbor $Q$ learning approach, which has a sample complexity of $\tilde{O}(\frac{1}{ε^{d+3}(1-γ)^{d+7}})$ for $ε$-accurate $Q$ function estimation with discount factor $γ$. In this paper, we propose two new nearest neighbor $Q$ learning methods, one for the offline setting and the other for the online setting. We show that the sample complexities of these two methods are $\tilde{O}(\frac{1}{ε^{d+2}(1-γ)^{d+2}})$ and $\tilde{O}(\frac{1}{ε^{d+2}(1-γ)^{d+3}})$ for offline and online methods respectively, which significantly improve over existing results and have minimax optimal dependence over $ε$. We achieve such improvement by utilizing the samples more efficiently. In particular, the method in \cite{shah2018q} clears up all samples after each iteration, thus these samples are somewhat wasted. On the other hand, our offline method does not remove any samples, and our online method only removes samples with time earlier than $βt$ at time $t$ with $β$ being a tunable parameter, thus our methods significantly reduce the loss of information. Apart from the sample complexity, our methods also have additional advantages of better computational complexity, as well as suitability to unbounded state spaces.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes