MELGOCPRFeb 18, 2024

Online Resource Allocation with Average Budget Constraints

arXiv:2402.11425v51 citationsh-index: 71
Originality Highly original
AI Analysis

This work addresses resource allocation challenges for applications like online anomaly detection and sequential advertising, offering improved regret bounds with practical extensions.

The paper tackles the problem of online resource allocation with average budget constraints, showing that a simple policy achieves O(√T) regret, which is optimal in general, and proposes a novel policy with budget safety buffers that reduces regret to O(ln² T) for discrete arrivals, validated through synthetic and real-world experiments.

We consider the problem of online resource allocation with average budget constraints. At each time point the decision maker makes an irrevocable decision of whether to accept or reject a request before the next request arrives with the goal to maximize the cumulative rewards. In contrast to existing literature requiring the total resource consumption is below a certain level, we require the average resource consumption per accepted request does not exceed a given threshold. This problem can be casted as an online knapsack problem with exogenous random budget replenishment, and can find applications in various fields such as online anomaly detection, sequential advertising, and per-capita public service providers. We start with general arrival distributions and show that a simple policy achieves a $O(\sqrt{T})$ regret. We complement the result by showing that such a regret growing rate is in general not improvable. We then shift our focus to discrete arrival distributions. We find that many existing re-solving heuristics in the online resource allocation literature, albeit achieve bounded loss in canonical settings, may incur a $Ω(\sqrt{T})$ or even a $Ω(T)$ regret. With the observation that canonical policies tend to be too optimistic and over accept arrivals, we propose a novel policy that incorporates budget safety buffers. It turns out that a little more safety can greatly enhance efficiency -- small additional logarithmic buffers suffice to reduce the regret from $Ω(\sqrt{T})$ or even $Ω(T)$ to $O(\ln^2 T)$. From a practical perspective, we extend the policy to the scenario with continuous arrival distributions, time-dependent information structures, as well as unknown $T$. We conduct both synthetic experiments and empirical applications on a time series data of New York City taxi passengers to validate the performance of our proposed policies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes