A2C is a special case of PPO
This provides a theoretical insight for researchers in reinforcement learning, but it is incremental as it refines understanding rather than introducing new methods.
The paper tackles the problem of clarifying the relationship between A2C and PPO in deep reinforcement learning, showing through theoretical and empirical analysis that A2C is a special case of PPO, with experiments confirming they produce identical models under controlled settings.
Advantage Actor-critic (A2C) and Proximal Policy Optimization (PPO) are popular deep reinforcement learning algorithms used for game AI in recent years. A common understanding is that A2C and PPO are separate algorithms because PPO's clipped objective appears significantly different than A2C's objective. In this paper, however, we show A2C is a special case of PPO. We present theoretical justifications and pseudocode analysis to demonstrate why. To validate our claim, we conduct an empirical experiment using \texttt{Stable-baselines3}, showing A2C and PPO produce the \textit{exact} same models when other settings are controlled.