OC LG MF MLAug 18, 2024

Exploratory Optimal Stopping: A Singular Control Formulation

Jodi Dianetti, Giorgio Ferrari, Renyuan Xu

arXiv:2408.09335v217 citationsh-index: 7

Originality Incremental advance

AI Analysis

This work addresses optimal stopping problems in reinforcement learning, offering a novel formulation and algorithm, but it appears incremental as it builds on existing control and regularization techniques.

The paper tackles continuous-time optimal stopping problems by formulating them with randomized stopping times and introducing entropy regularization to encourage exploration, deriving a unique optimal exploratory strategy and proposing a reinforcement learning algorithm with proven policy improvement and convergence.

This paper explores continuous-time and state-space optimal stopping problems from a reinforcement learning perspective. We begin by formulating the stopping problem using randomized stopping times, where the decision maker's control is represented by the probability of stopping within a given time--specifically, a bounded, non-decreasing, càdlàg control process. To encourage exploration and facilitate learning, we introduce a regularized version of the problem by penalizing it with the cumulative residual entropy of the randomized stopping time. The regularized problem takes the form of an (n+1)-dimensional degenerate singular stochastic control with finite-fuel. We address this through the dynamic programming principle, which enables us to identify the unique optimal exploratory strategy. For the specific case of a real option problem, we derive a semi-explicit solution to the regularized problem, allowing us to assess the impact of entropy regularization and analyze the vanishing entropy limit. Finally, we propose a reinforcement learning algorithm based on policy iteration. We show both policy improvement and policy convergence results for our proposed algorithm.

View on arXiv PDF

Similar