Orthogonal Policy Gradient and Autonomous Driving Application
This addresses generalization issues in reinforcement learning for autonomous driving, but appears incremental as it builds on existing methods for policy gradients.
The paper tackles the lack of generalization in deep reinforcement learning for complex tasks by proposing orthogonal policy gradient descent (OPGD), which enables agents to learn policy gradients based on state and action sets, and evaluates it on the TORCS 3D autonomous driving environment compared to a baseline model.
One less addressed issue of deep reinforcement learning is the lack of generalization capability based on new state and new target, for complex tasks, it is necessary to give the correct strategy and evaluate all possible actions for current state. Fortunately, deep reinforcement learning has enabled enormous progress in both subproblems: giving the correct strategy and evaluating all actions based on the state. In this paper we present an approach called orthogonal policy gradient descent(OPGD) that can make agent learn the policy gradient based on the current state and the actions set, by which the agent can learn a policy network with generalization capability. we evaluate the proposed method on the 3D autonomous driving enviroment TORCS compared with the baseline model, detailed analyses of experimental results and proofs are also given.