AIJan 19, 2022

Critic Algorithms using Cooperative Networks

arXiv:2201.07839v1

Originality Incremental advance

AI Analysis

This work addresses policy evaluation for reinforcement learning practitioners, offering incremental improvements in convergence speed.

The authors tackled policy evaluation in Markov Decision Processes by proposing a gradient-based algorithm that tracks the Projected Bellman Error, achieving faster convergence than GTD2 algorithms and comparable results in DQN and DDPG experiments.

An algorithm is proposed for policy evaluation in Markov Decision Processes which gives good empirical results with respect to convergence rates. The algorithm tracks the Projected Bellman Error and is implemented as a true gradient based algorithm. In this respect this algorithm differs from TD($λ$) class of algorithms. This algorithm tracks the Projected Bellman Algorithm and is therefore different from the class of residual algorithms. Further the convergence of this algorithm is empirically much faster than GTD2 class of algorithms which aim at tracking the Projected Bellman Error. We implemented proposed algorithm in DQN and DDPG framework and found that our algorithm achieves comparable results in both of these experiments

View on arXiv PDF

Similar