PMCELGOCApr 25, 2019

Continuous-Time Mean-Variance Portfolio Selection: A Reinforcement Learning Framework

arXiv:1904.11392v210 citations
Originality Highly original
AI Analysis

This addresses portfolio optimization for investors by providing a novel RL-based solution, though it is incremental as it builds on classical mean-variance approaches.

The paper tackles the continuous-time mean-variance portfolio selection problem by formulating it as an entropy-regularized reinforcement learning framework, proving optimal Gaussian policies and developing an algorithm that outperforms existing methods by a large margin in simulations.

We approach the continuous-time mean-variance (MV) portfolio selection with reinforcement learning (RL). The problem is to achieve the best tradeoff between exploration and exploitation, and is formulated as an entropy-regularized, relaxed stochastic control problem. We prove that the optimal feedback policy for this problem must be Gaussian, with time-decaying variance. We then establish connections between the entropy-regularized MV and the classical MV, including the solvability equivalence and the convergence as exploration weighting parameter decays to zero. Finally, we prove a policy improvement theorem, based on which we devise an implementable RL algorithm. We find that our algorithm outperforms both an adaptive control based method and a deep neural networks based algorithm by a large margin in our simulations.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes