Variance Adjusted Actor Critic Algorithms
This work addresses the need for risk-aware reinforcement learning algorithms, but it appears incremental as it extends existing actor-critic methods to a variance-adjusted setting.
The authors tackled the problem of optimizing variance-adjusted expected return in Markov Decision Processes by proposing an actor-critic framework with linear function approximation and compatible features, resulting in an episodic algorithm that converges almost surely to a locally optimal point.
We present an actor-critic framework for MDPs where the objective is the variance-adjusted expected return. Our critic uses linear function approximation, and we extend the concept of compatible features to the variance-adjusted setting. We present an episodic actor-critic algorithm and show that it converges almost surely to a locally optimal point of the objective function.