DS LGNov 6, 2025

Online Algorithms for Repeated Optimal Stopping: Achieving Both Competitive Ratio and Regret Bounds

Tsubasa Harada, Yasushi Kawase, Hanna Sumita

arXiv:2511.04484v11.2h-index: 10

Originality Incremental advance

AI Analysis

This work addresses the challenge of balancing competitive guarantees and long-term performance in repeated decision-making problems, such as prophet inequality and secretary problems, with incremental improvements in algorithmic design.

The paper tackles the repeated optimal stopping problem by designing an algorithmic framework that guarantees a competitive ratio in each round and achieves sublinear regret across rounds, with results including a 1/2-competitive ratio from the second round onward and an O~(√T) regret bound, which is shown to be nearly optimal.

We study the repeated optimal stopping problem, which generalizes the classical optimal stopping problem with an unknown distribution to a setting where the same problem is solved repeatedly over $T$ rounds. In this framework, we aim to design algorithms that guarantee a competitive ratio in each round while also achieving sublinear regret across all rounds. Our primary contribution is a general algorithmic framework that achieves these objectives simultaneously for a wide array of repeated optimal stopping problems. The core idea is to dynamically select an algorithm for each round, choosing between two candidates: (1) an empirically optimal algorithm derived from the history of observations, and (2) a sample-based algorithm with a proven competitive ratio guarantee. Based on this approach, we design an algorithm that performs no worse than the baseline sample-based algorithm in every round, while ensuring that the total regret is bounded by $\tilde{O}(\sqrt{T})$. We demonstrate the broad applicability of our framework to canonical problems, including the prophet inequality, the secretary problem, and their variants under adversarial, random, and i.i.d. input models. For example, for the repeated prophet inequality problem, our method achieves a $1/2$-competitive ratio from the second round on and an $\tilde{O}(\sqrt{T})$ regret. Furthermore, we establish a regret lower bound of $Ω(\sqrt{T})$ even in the i.i.d. model, confirming that our algorithm's performance is almost optimal with respect to the number of rounds.

View on arXiv PDF

Similar