A Matrix Splitting Perspective on Planning with Options
This provides theoretical insights for reinforcement learning researchers, but it is incremental as it builds on existing options frameworks with a new analytical perspective.
The paper tackles the problem of understanding the convergence behavior of planning with options in reinforcement learning by showing that the Bellman operator leads to a matrix splitting, and it demonstrates how the asymptotic convergence rate depends on the timescales of the options, highlighting a trade-off between performance and computational cost.
We show that the Bellman operator underlying the options framework leads to a matrix splitting, an approach traditionally used to speed up convergence of iterative solvers for large linear systems of equations. Based on standard comparison theorems for matrix splittings, we then show how the asymptotic rate of convergence varies as a function of the inherent timescales of the options. This new perspective highlights a trade-off between asymptotic performance and the cost of computation associated with building a good set of options.