LGFeb 8, 2021

Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature

arXiv:2102.04168v539 citations
Originality Highly original
AI Analysis

This work provides a new theoretical framework for understanding and achieving local optima in nonlinear bandit and RL, which is significant for researchers and practitioners dealing with complex function approximations where global optima are unattainable.

This paper addresses model-based bandit and reinforcement learning with nonlinear function approximations, demonstrating that global convergence is statistically intractable even for simple neural networks. The authors propose ViOlin, an algorithm that provably converges to a local maximum with sample complexity dependent on the sequential Rademacher complexity of the model class, leading to novel regret bounds in various settings.

This paper studies model-based bandit and reinforcement learning (RL) with nonlinear function approximations. We propose to study convergence to approximate local maxima because we show that global convergence is statistically intractable even for one-layer neural net bandit with a deterministic reward. For both nonlinear bandit and RL, the paper presents a model-based algorithm, Virtual Ascent with Online Model Learner (ViOlin), which provably converges to a local maximum with sample complexity that only depends on the sequential Rademacher complexity of the model class. Our results imply novel global or local regret bounds on several concrete settings such as linear bandit with finite or sparse model class, and two-layer neural net bandit. A key algorithmic insight is that optimism may lead to over-exploration even for two-layer neural net model class. On the other hand, for convergence to local maxima, it suffices to maximize the virtual return if the model can also reasonably predict the size of the gradient and Hessian of the real return.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes