Feature-Based Aggregation and Deep Reinforcement Learning: A Survey and Some New Implementations
This work addresses policy iteration efficiency in reinforcement learning, presenting an incremental improvement by integrating feature-based aggregation with deep learning techniques.
The paper tackles the problem of solving finite-state discounted Markov decision problems by proposing feature-based aggregation methods that create a smaller aggregate problem using state features, and introduces a new approximate policy iteration approach that combines aggregation with deep neural network feature construction. The result is claimed to achieve more accurate policy cost function approximation compared to linear neural network-based methods, potentially leading to more effective policy improvement.
In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller "aggregate" Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural network-based reinforcement learning, thereby potentially leading to more effective policy improvement.