Risk-Averse Stochastic Convex Bandit
This addresses risk-aversion in online convex bandit problems, which is important for applications like clinical trials and finance, and is presented as the first attempt in this area.
The authors tackled the problem of online convex optimization with bandit feedback for risk-averse decision-makers, motivated by applications in clinical trials and finance. They proposed two algorithms, with the second achieving (almost) optimal regret bounds in terms of the number of rounds.
Motivated by applications in clinical trials and finance, we study the problem of online convex optimization (with bandit feedback) where the decision maker is risk-averse. We provide two algorithms to solve this problem. The first one is a descent-type algorithm which is easy to implement. The second algorithm, which combines the ellipsoid method and a center point device, achieves (almost) optimal regret bounds with respect to the number of rounds. To the best of our knowledge this is the first attempt to address risk-aversion in the online convex bandit problem.