Deep Reinforcement Learning in Factor Investment
This work makes deep reinforcement learning practical for institutional, low-turnover portfolio management by improving factor-aware representation learning.
The paper tackled the problem of applying deep reinforcement learning to low-frequency factor portfolio construction by addressing the high-dimensional, unbalanced state space from stocks entering and exiting the investable universe, resulting in a method that delivered a 24.6% compound return and a Sharpe ratio of 0.94 out of sample.
Deep reinforcement learning has shown promise in trade execution, yet its use in low-frequency factor portfolio construction remains under-explored. A key obstacle is the high-dimensional, unbalanced state space created by stocks that enter and exit the investable universe. We introduce Conditional Auto-encoded Factor-based Portfolio Optimisation (CAFPO), which compresses stock-level returns into a small set of latent factors conditioned on 94 firm-specific characteristics. The factors feed a DRL agent implemented with both PPO and DDPG to generate continuous long-short weights. On 20 years of U.S. equity data (2000--2020), CAFPO outperforms equal-weight, value-weight, Markowitz, vanilla DRL, and Fama--French-driven DRL, delivering a 24.6\% compound return and a Sharpe ratio of 0.94 out of sample. SHAP analysis further reveals economically intuitive factor attributions. Our results demonstrate that factor-aware representation learning can make DRL practical for institutional, low-turnover portfolio management.