Discrete-to-Deep Supervised Policy Learning
This addresses the problem of sample correlation in RL for researchers and practitioners, offering an incremental improvement by combining discretization and supervised learning.
The paper tackles the challenge of training neural networks in reinforcement learning by proposing Discrete-to-Deep Supervised Policy Learning (D2D-SPL), which discretizes continuous state spaces and uses actor-critic methods to learn policies, resulting in faster learning without experience replay compared to state-of-the-art methods.
Neural networks are effective function approximators, but hard to train in the reinforcement learning (RL) context mainly because samples are correlated. For years, scholars have got around this by employing experience replay or an asynchronous parallel-agent system. This paper proposes Discrete-to-Deep Supervised Policy Learning (D2D-SPL) for training neural networks in RL. D2D-SPL discretises the continuous state space into discrete states and uses actor-critic to learn a policy. It then selects from each discrete state an input value and the action with the highest numerical preference as an input/target pair. Finally it uses input/target pairs from all discrete states to train a classifier. D2D-SPL uses a single agent, needs no experience replay and learns much faster than state-of-the-art methods. We test our method with two RL environments, the Cartpole and an aircraft manoeuvring simulator.