AIFeb 27

Portfolio Reinforcement Learning with Scenario-Context Rollout

Vanya Priscillia Bendatu, Yao Lu

arXiv:2602.24037v1

Originality Incremental advance

AI Analysis

This addresses performance degradation in portfolio rebalancing for financial investors during market stress events, representing a domain-specific incremental advance.

The paper tackled the problem of portfolio rebalancing policies degrading due to market regime shifts by proposing a macro-conditioned scenario-context rollout method to generate plausible next-day return scenarios under stress events, resulting in improvements such as up to 76% higher Sharpe ratio and up to 53% lower maximum drawdown in out-of-sample evaluations across 31 portfolios.

Market regime shifts induce distribution shifts that can degrade the performance of portfolio rebalancing policies. We propose macro-conditioned scenario-context rollout (SCR) that generates plausible next-day multivariate return scenarios under stress events. However, doing so faces new challenges, as history will never tell what would have happened differently. As a result, incorporating scenario-based rewards from rollouts introduces a reward--transition mismatch in temporal-difference learning, destabilizing RL critic training. We analyze this inconsistency and show it leads to a mixed evaluation target. Guided by this analysis, we construct a counterfactual next state using the rollout-implied continuations and augment the critic agent's bootstrap target. Doing so stabilizes the learning and provides a viable bias-variance tradeoff. In out-of-sample evaluations across 31 distinct universes of U.S. equity and ETF portfolios, our method improves Sharpe ratio by up to 76% and reduces maximum drawdown by up to 53% compared with classic and RL-based portfolio rebalancing baselines.

View on arXiv PDF

Similar