LGAIDec 2, 2021

Risk-Aware Algorithms for Combinatorial Semi-Bandits

arXiv:2112.01141v12 citations
Originality Incremental advance
AI Analysis

This addresses risk management in decision-making under uncertainty for applications like finance or resource allocation, representing an incremental extension to existing bandit frameworks.

The paper tackles the problem of making combinatorial multi-armed bandits risk-aware by maximizing Conditional Value-at-Risk (CVaR) for worst-case rewards, proposing new algorithms with regret bounds for Gaussian and bounded reward cases.

In this paper, we study the stochastic combinatorial multi-armed bandit problem under semi-bandit feedback. While much work has been done on algorithms that optimize the expected reward for linear as well as some general reward functions, we study a variant of the problem, where the objective is to be risk-aware. More specifically, we consider the problem of maximizing the Conditional Value-at-Risk (CVaR), a risk measure that takes into account only the worst-case rewards. We propose new algorithms that maximize the CVaR of the rewards obtained from the super arms of the combinatorial bandit for the two cases of Gaussian and bounded arm rewards. We further analyze these algorithms and provide regret bounds. We believe that our results provide the first theoretical insights into combinatorial semi-bandit problems in the risk-aware case.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes