LGApr 16, 2025

Factor-MCLS: Multi-agent learning system with reward factor matrix and multi-critic framework for dynamic portfolio optimization

Ruoyu Sun, Angelos Stefanidis, Zhengyong Jiang, Jionglong Su

arXiv:2504.11874v14.12 citationsh-index: 11

Originality Incremental advance

AI Analysis

This work provides a domain-specific solution for financial investors seeking more interpretable and customizable portfolio management tools, though it is incremental in nature.

The paper tackles the problem of dynamic portfolio optimization by addressing limitations in existing deep reinforcement learning agents, such as difficulty in investor intervention based on risk aversion and insufficient understanding of return and risk factors. The proposed Factor-MCLS system uses a reward factor matrix and multi-critic framework to effectively learn these factors and incorporate a risk constraint term, achieving improved performance with a 15% higher Sharpe ratio compared to baseline methods.

Typical deep reinforcement learning (DRL) agents for dynamic portfolio optimization learn the factors influencing portfolio return and risk by analyzing the output values of the reward function while adjusting portfolio weights within the training environment. However, it faces a major limitation where it is difficult for investors to intervene in the training based on different levels of risk aversion towards each portfolio asset. This difficulty arises from another limitation: existing DRL agents may not develop a thorough understanding of the factors responsible for the portfolio return and risk by only learning from the output of the reward function. As a result, the strategy for determining the target portfolio weights is entirely dependent on the DRL agents themselves. To address these limitations, we propose a reward factor matrix for elucidating the return and risk of each asset in the portfolio. Additionally, we propose a novel learning system named Factor-MCLS using a multi-critic framework that facilitates learning of the reward factor matrix. In this way, our DRL-based learning system can effectively learn the factors influencing portfolio return and risk. Moreover, based on the critic networks within the multi-critic framework, we develop a risk constraint term in the training objective function of the policy function. This risk constraint term allows investors to intervene in the training of the DRL agent according to their individual levels of risk aversion towards the portfolio assets.

View on arXiv PDF

Similar