Deep Reinforcement Learning with Surrogate Agent-Environment Interface
This work addresses a specific bottleneck in reinforcement learning for tasks requiring continuous control from discrete actions, but it appears incremental as it builds on existing methods like DQN.
The paper tackles the challenge of enabling continuous control for discrete actions in reinforcement learning by proposing a surrogate agent-environment interface and a new algorithm, PSADPG, which achieves performance comparable to DQN in certain tasks during initial training stages.
In this paper, we propose surrogate agent-environment interface (SAEI) in reinforcement learning. We also state that learning based on probability surrogate agent-environment interface provides optimal policy of task agent-environment interface. We introduce surrogate probability action and develop the probability surrogate action deterministic policy gradient (PSADPG) algorithm based on SAEI. This algorithm enables continuous control of discrete action. The experiments show PSADPG achieves the performance of DQN in certain tasks with the stochastic optimal policy nature in the initial training stage.