GTAICYLGJun 8, 2020

Learning under Invariable Bayesian Safety

arXiv:2006.04497v1
AI Analysis

This work addresses safety for individuals in recommendation-like systems, but it appears incremental as it builds on existing bandit models.

The paper tackles the problem of ensuring safety constraints in explore-and-exploit systems by introducing a constraint that requires the expected value in each round to exceed a threshold, and it devises an asymptotically optimal algorithm with an instance-dependent convergence rate.

A recent body of work addresses safety constraints in explore-and-exploit systems. Such constraints arise where, for example, exploration is carried out by individuals whose welfare should be balanced with overall welfare. In this paper, we adopt a model inspired by recent work on a bandit-like setting for recommendations. We contribute to this line of literature by introducing a safety constraint that should be respected in every round and determines that the expected value in each round is above a given threshold. Due to our modeling, the safe explore-and-exploit policy deserves careful planning, or otherwise, it will lead to sub-optimal welfare. We devise an asymptotically optimal algorithm for the setting and analyze its instance-dependent convergence rate.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes