LG AI MLSep 4, 2019

Censored Semi-Bandits: A Framework for Resource Allocation with Censored Feedback

Arun Verma, Manjesh K. Hanawal, Arun Rajkumar, Raman Sankaran

arXiv:1909.01504v38.119 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses resource allocation challenges in systems with censored feedback, such as in operations research or online advertising, but it is incremental as it builds on known bandit frameworks.

The paper tackles the problem of resource allocation with censored feedback, where losses are zero if allocations exceed unknown thresholds, aiming to minimize expected loss by learning feasible allocations. It establishes equivalences to existing bandit problems and derives optimal algorithms, validated on synthetic data with performance guarantees.

In this paper, we study censored Semi-Bandits, a novel variant of the semi-bandits problem. The learner is assumed to have a fixed amount of resources, which it allocates to the arms at each time step. The loss observed from an arm is random and depends on the amount of resources allocated to it. More specifically, the loss equals zero if the allocation for the arm exceeds a constant (but unknown)threshold that can be dependent on the arm. Our goal is to learn a feasible allocation that minimizes the expected loss. The problem is challenging because the loss distribution and threshold value of each arm are unknown. We study this novel setting by establishing its `equivalence' to Multiple-Play Multi-Armed Bandits(MP-MAB) and Combinatorial Semi-Bandits. Exploiting these equivalences, we derive optimal algorithms for our setting using existing algorithms for MP-MABand Combinatorial Semi-Bandits. Experiments on synthetically generated data validate performance guarantees of the proposed algorithms.

View on arXiv PDF Code

Similar