GMAC: A Distributional Perspective on Actor-Critic Framework
This addresses distributional issues in reinforcement learning for researchers and practitioners, offering an incremental improvement over conventional actor-critic methods.
The paper tackled distributional instability and action type restrictions in actor-critic methods by proposing GMAC, a distributional framework that minimizes Cramér distance using a Sample-Replacement algorithm and Gaussian Mixture Models. It improved performance in discrete and continuous action spaces on ALE and PyBullet environments with low computational cost.
In this paper, we devise a distributional framework on actor-critic as a solution to distributional instability, action type restriction, and conflation between samples and statistics. We propose a new method that minimizes the Cramér distance with the multi-step Bellman target distribution generated from a novel Sample-Replacement algorithm denoted SR($λ$), which learns the correct value distribution under multiple Bellman operations. Parameterizing a value distribution with Gaussian Mixture Model further improves the efficiency and the performance of the method, which we name GMAC. We empirically show that GMAC captures the correct representation of value distributions and improves the performance of a conventional actor-critic method with low computational cost, in both discrete and continuous action spaces using Arcade Learning Environment (ALE) and PyBullet environment.