LGMLSep 26, 2020

Near-Optimal MNL Bandits Under Risk Criteria

arXiv:2009.12511v32 citations
Originality Highly original
AI Analysis

This addresses risk-aware decision-making in industries and business, offering a novel extension to traditional bandit problems.

The paper tackles the problem of multi-armed bandits under risk criteria, designing algorithms for a broad class of risk measures like conditional value-at-risk and proving they achieve near-optimal regret, with experiments on synthetic and real data.

We study MNL bandits, which is a variant of the traditional multi-armed bandit problem, under risk criteria. Unlike the ordinary expected revenue, risk criteria are more general goals widely used in industries and bussiness. We design algorithms for a broad class of risk criteria, including but not limited to the well-known conditional value-at-risk, Sharpe ratio and entropy risk, and prove that they suffer a near-optimal regret. As a complement, we also conduct experiments with both synthetic and real data to show the empirical performance of our proposed algorithms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes