OCLGMFMLAug 18, 2024

Exploratory Optimal Stopping: A Singular Control Formulation

arXiv:2408.09335v217 citationsh-index: 7
Originality Incremental advance
AI Analysis

This work addresses optimal stopping problems in reinforcement learning, offering a novel formulation and algorithm, but it appears incremental as it builds on existing control and regularization techniques.

The paper tackles continuous-time optimal stopping problems by formulating them with randomized stopping times and introducing entropy regularization to encourage exploration, deriving a unique optimal exploratory strategy and proposing a reinforcement learning algorithm with proven policy improvement and convergence.

This paper explores continuous-time and state-space optimal stopping problems from a reinforcement learning perspective. We begin by formulating the stopping problem using randomized stopping times, where the decision maker's control is represented by the probability of stopping within a given time--specifically, a bounded, non-decreasing, càdlàg control process. To encourage exploration and facilitate learning, we introduce a regularized version of the problem by penalizing it with the cumulative residual entropy of the randomized stopping time. The regularized problem takes the form of an (n+1)-dimensional degenerate singular stochastic control with finite-fuel. We address this through the dynamic programming principle, which enables us to identify the unique optimal exploratory strategy. For the specific case of a real option problem, we derive a semi-explicit solution to the regularized problem, allowing us to assess the impact of entropy regularization and analyze the vanishing entropy limit. Finally, we propose a reinforcement learning algorithm based on policy iteration. We show both policy improvement and policy convergence results for our proposed algorithm.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes