LG ST MLFeb 26, 2020

Structured Linear Contextual Bandits: A Sharp and Geometric Smoothed Analysis

Vidyashankar Sivakumar, Zhiwei Steven Wu, Arindam Banerjee

arXiv:2002.11332v112.824 citations

Originality Incremental advance

AI Analysis

This work addresses bandit learning for practical applications where worst-case exploration is rare, offering improved analysis for structured parameters, but it is incremental as it builds on existing smoothed analysis frameworks.

The paper tackles the problem of structured linear contextual bandits in a smoothed setting with perturbed adversarial contexts, proposing simple greedy algorithms for single- and multi-parameter cases and providing unified regret bounds in terms of geometric quantities like Gaussian widths, with sharper bounds for unstructured settings.

Bandit learning algorithms typically involve the balance of exploration and exploitation. However, in many practical applications, worst-case scenarios needing systematic exploration are seldom encountered. In this work, we consider a smoothed setting for structured linear contextual bandits where the adversarial contexts are perturbed by Gaussian noise and the unknown parameter $θ^*$ has structure, e.g., sparsity, group sparsity, low rank, etc. We propose simple greedy algorithms for both the single- and multi-parameter (i.e., different parameter for each context) settings and provide a unified regret analysis for $θ^*$ with any assumed structure. The regret bounds are expressed in terms of geometric quantities such as Gaussian widths associated with the structure of $θ^*$. We also obtain sharper regret bounds compared to earlier work for the unstructured $θ^*$ setting as a consequence of our improved analysis. We show there is implicit exploration in the smoothed setting where a simple greedy algorithm works.

View on arXiv PDF

Similar