MLLGDec 12, 2022

Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes

arXiv:2212.05949v441 citationsh-index: 64
AI Analysis

This work addresses robustness to corruption in reinforcement learning for scenarios with general function approximation, representing a significant advance over prior linear or suboptimal bounds.

The paper tackles the problem of adversarial corruption in nonlinear contextual bandits and Markov decision processes, achieving a regret bound of $ ilde{O}(\sqrt{T}+ζ)$, which improves upon existing methods by reducing dependence on corruption level $ζ$ from multiplicative to additive.

Despite the significant interest and progress in reinforcement learning (RL) problems with adversarial corruption, current works are either confined to the linear setting or lead to an undesired $\tilde{O}(\sqrt{T}ζ)$ regret bound, where $T$ is the number of rounds and $ζ$ is the total amount of corruption. In this paper, we consider the contextual bandit with general function approximation and propose a computationally efficient algorithm to achieve a regret of $\tilde{O}(\sqrt{T}+ζ)$. The proposed algorithm relies on the recently developed uncertainty-weighted least-squares regression from linear contextual bandit and a new weighted estimator of uncertainty for the general function class. In contrast to the existing analysis that heavily relies on the linear structure, we develop a novel technique to control the sum of weighted uncertainty, thus establishing the final regret bounds. We then generalize our algorithm to the episodic MDP setting and first achieve an additive dependence on the corruption level $ζ$ in the scenario of general function approximation. Notably, our algorithms achieve regret bounds either nearly match the performance lower bound or improve the existing methods for all the corruption levels and in both known and unknown $ζ$ cases.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes