Stochastic Bandits for Crowdsourcing and Multi-Platform Autobidding
This work addresses budget allocation challenges in crowdsourcing and multi-platform autobidding, offering incremental algorithmic improvements with theoretical guarantees.
The paper tackles the problem of allocating a fixed budget across multiple tasks or auctions with stochastic rewards, where each reward's unlocking probability depends on the allocated fraction. It presents an algorithm achieving expected regret of order K√T (matching a lower bound) and improved bounds of order K(log T)^2 under diminishing-returns conditions.
Motivated by applications in crowdsourcing, where a fixed sum of money is split among $K$ workers, and autobidding, where a fixed budget is used to bid in $K$ simultaneous auctions, we define a stochastic bandit model where arms belong to the $K$-dimensional probability simplex and represent the fraction of budget allocated to each task/auction. The reward in each round is the sum of $K$ stochastic rewards, where each of these rewards is unlocked with a probability that varies with the fraction of the budget allocated to that task/auction. We design an algorithm whose expected regret after $T$ steps is of order $K\sqrt{T}$ (up to log factors) and prove a matching lower bound. Improved bounds of order $K (\log T)^2$ are shown when the function mapping budget to probability of unlocking the reward (i.e., terminating the task or winning the auction) satisfies additional diminishing-returns conditions.