Learning Efficiently Function Approximation for Contextual MDP
This work addresses the challenge of efficient learning in contextual MDPs for reinforcement learning applications, providing a general reduction to supervised learning, but it appears incremental as it builds on existing frameworks with specific assumptions.
The paper tackles the problem of learning contextual Markov Decision Processes (MDPs) using function approximation for rewards and dynamics, deriving polynomial sample and time complexity for both context-dependent and independent dynamics models, assuming an efficient Empirical Risk Minimization (ERM) oracle.
We study learning contextual MDPs using a function approximation for both the rewards and the dynamics. We consider both the case that the dynamics dependent or independent of the context. For both models we derive polynomial sample and time complexity (assuming an efficient ERM oracle). Our methodology gives a general reduction from learning contextual MDP to supervised learning.