LGMFFeb 3, 2025

Exploratory Utility Maximization Problem with Tsallis Entropy

arXiv:2502.01269v1
Originality Incremental advance
AI Analysis

This work addresses exploration in reinforcement learning for financial decision-making, but it is incremental as it builds on classical utility maximization by generalizing entropy regularization.

The paper tackles the expected utility maximization problem in a reinforcement learning framework by introducing Tsallis entropy for exploration, finding that the problem can become ill-posed due to over-exploration, and provides semi-closed-form solutions for two examples, including one with a Gaussian distribution and another with a Wigner semicircle distribution as optimal strategies.

We study expected utility maximization problem with constant relative risk aversion utility function in a complete market under the reinforcement learning framework. To induce exploration, we introduce the Tsallis entropy regularizer, which generalizes the commonly used Shannon entropy. Unlike the classical Merton's problem, which is always well-posed and admits closed-form solutions, we find that the utility maximization exploratory problem is ill-posed in certain cases, due to over-exploration. With a carefully selected primary temperature function, we investigate two specific examples, for which we fully characterize their well-posedness and provide semi-closed-form solutions. It is interesting to find that one example has the well-known Gaussian distribution as the optimal strategy, while the other features the rare Wigner semicircle distribution, which is equivalent to a scaled Beta distribution. The means of the two optimal exploratory policies coincide with that of the classical counterpart. In addition, we examine the convergence of the value function and optimal exploratory strategy as the exploration vanishes. Finally, we design a reinforcement learning algorithm and conduct numerical experiments to demonstrate the advantages of reinforcement learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes