LGMLOct 3, 2020

Mean-Variance Efficient Reinforcement Learning with Applications to Dynamic Financial Investment

arXiv:2010.01404v4
Originality Incremental advance
AI Analysis

This addresses computational challenges in dynamic financial investment for RL applications, though it is incremental as it builds on existing constrained optimization approaches.

The study tackles the mean-variance trade-off in reinforcement learning by proposing a method to train policies that maximize expected quadratic utility, resulting in computationally efficient MV-efficient policies without gradient estimation of variance.

This study investigates the mean-variance (MV) trade-off in reinforcement learning (RL), an instance of the sequential decision-making under uncertainty. Our objective is to obtain MV-efficient policies whose means and variances are located on the Pareto efficient frontier with respect to the MV trade-off; under the condition, any increase in the expected reward would necessitate a corresponding increase in variance, and vice versa. To this end, we propose a method that trains our policy to maximize the expected quadratic utility, defined as a weighted sum of the first and second moments of the rewards obtained through our policy. We subsequently demonstrate that the maximizer indeed qualifies as an MV-efficient policy. Previous studies that employed constrained optimization to address the MV trade-off have encountered computational challenges. However, our approach is more computationally efficient as it eliminates the need for gradient estimation of variance, a contributing factor to the double sampling issue observed in existing methodologies. Through experimentation, we validate the efficacy of our approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes