Functional Bandits
This addresses a gap in best-arm identification techniques for applications such as risk management and information theory, though it appears incremental as it builds on existing bandit frameworks.
The paper tackles the functional bandit problem of optimizing known functionals of unknown arm-reward distributions, which arises in areas like natural language processing and risk-averse decision-making, and proposes a method combining functional estimation and arm elimination that achieves provably efficient performance guarantees.
We introduce the functional bandit problem, where the objective is to find an arm that optimises a known functional of the unknown arm-reward distributions. These problems arise in many settings such as maximum entropy methods in natural language processing, and risk-averse decision-making, but current best-arm identification techniques fail in these domains. We propose a new approach, that combines functional estimation and arm elimination, to tackle this problem. This method achieves provably efficient performance guarantees. In addition, we illustrate this method on a number of important functionals in risk management and information theory, and refine our generic theoretical results in those cases.