LG DS GT MLJun 22, 2020

Adaptive Discretization against an Adversary: Lipschitz bandits, Dynamic Pricing, and Auction Tuning

arXiv:2006.12367v45.815 citations

Originality Highly original

AI Analysis

This work addresses the challenge of adversarial rewards in structured action spaces like bandits, which is significant for applications such as dynamic pricing and auction tuning, though it builds incrementally on prior stochastic methods.

The paper tackles the adversarial version of Lipschitz bandits, where rewards are not stochastic, by introducing the Adversarial Zooming algorithm, which achieves instance-dependent regret bounds and recovers worst-case optimal regret. It applies this to dynamic pricing and auction tuning under adversarial rewards, extending to multi-product settings without requiring smoothness assumptions.

Lipschitz bandits is a prominent version of multi-armed bandits that studies large, structured action spaces such as the $[0,1]$ interval, where similar actions are guaranteed to have similar rewards. A central theme here is the adaptive discretization of the action space, which gradually ``zooms in'' on the more promising regions thereof. The goal is to take advantage of ``nicer'' problem instances, while retaining near-optimal worst-case performance. While the stochastic version of the problem is well-understood, the general version with adversarial rewards is not. We provide the first algorithm (\emph{Adversarial Zooming}) for adaptive discretization in the adversarial version, and derive instance-dependent regret bounds. In particular, we recover the worst-case optimal regret bound for the adversarial version, and the instance-dependent regret bound for the stochastic version. We apply our algorithm to several fundamental applications -- including dynamic pricing and auction reserve tuning -- all under adversarial reward models. While these domains often violate Lipschitzness, our analysis only requires a weaker version thereof, allowing for meaningful regret bounds without additional smoothness assumptions. Notably, we extend our results to multi-product dynamic pricing with non-smooth reward structures, a setting which does not even satisfy one-sided Lipschitzness.

View on arXiv PDF

Similar