OC LGJul 4, 2024

Continuous-time q-Learning for Jump-Diffusion Models under Tsallis Entropy

Lijun Bo, Yijie Huang, Xiang Yu, Tingting Zhang

arXiv:2407.03888v410.86 citationsh-index: 4

Originality Incremental advance

AI Analysis

This work addresses continuous-time control problems in finance, such as optimal liquidation, by introducing a novel entropy regularization approach, though it is incremental as it builds on existing q-learning methods.

The paper tackles continuous-time reinforcement learning in jump-diffusion models by developing q-learning algorithms under Tsallis entropy regularization, which allows for optimal policies that are not Gibbs measures and are explicitly characterized in examples like an optimal liquidation problem, with satisfactory performance demonstrated numerically.

This paper studies the continuous-time reinforcement learning in jump-diffusion models by featuring the q-learning (the continuous-time counterpart of Q-learning) under Tsallis entropy regularization. Contrary to the Shannon entropy, the general form of Tsallis entropy renders the optimal policy not necessarily a Gibbs measure. Herein, the Lagrange multiplier and KKT condition are needed to ensure that the learned policy is a probability density function. As a consequence, the characterization of the optimal policy using the q-function also involves a Lagrange multiplier. In response, we establish the martingale characterization of the q-function and devise two q-learning algorithms depending on whether the Lagrange multiplier can be derived explicitly or not. In the latter case, we consider different parameterizations of the optimal q-function and the optimal policy, and update them alternatively in an Actor-Critic manner. We also study two numerical examples, namely, an optimal liquidation problem in dark pools and a non-LQ control problem. It is interesting to see therein that the optimal policies under the Tsallis entropy regularization can be characterized explicitly, which are distributions concentrated on some compact support. The satisfactory performance of our q-learning algorithms is illustrated in each example.

View on arXiv PDF

Similar