Iterative Hierarchical Optimization for Misspecified Problems (IHOMP)
This addresses a fundamental challenge in reinforcement learning for complex, high-dimensional MDPs, offering a novel solution to misspecification, though it appears incremental in extending existing option-based methods.
The paper tackles the problem of misspecified Markov Decision Processes (MDPs) where function approximation cannot represent acceptable policies, by introducing IHOMP, an iterative method that learns context-specialized options and combines them to find near-optimal solutions, with experiments showing improved performance through Option Interruption.
For complex, high-dimensional Markov Decision Processes (MDPs), it may be necessary to represent the policy with function approximation. A problem is misspecified whenever, the representation cannot express any policy with acceptable performance. We introduce IHOMP : an approach for solving misspecified problems. IHOMP iteratively learns a set of context specialized options and combines these options to solve an otherwise misspecified problem. Our main contribution is proving that IHOMP enjoys theoretical convergence guarantees. In addition, we extend IHOMP to exploit Option Interruption (OI) enabling it to decide where the learned options can be reused. Our experiments demonstrate that IHOMP can find near-optimal solutions to otherwise misspecified problems and that OI can further improve the solutions.