A Variant of the Wang-Foster-Kakade Lower Bound for the Discounted Setting
This is an incremental result that clarifies theoretical limitations for researchers in reinforcement learning.
The paper tackles the problem of batch reinforcement learning in the discounted setting by adapting a known hardness result, showing that learning is impossible even with infinite data using a simplified 2-state MDP with 1-dimensional features.
Recently, Wang et al. (2020) showed a highly intriguing hardness result for batch reinforcement learning (RL) with linearly realizable value function and good feature coverage in the finite-horizon case. In this note we show that once adapted to the discounted setting, the construction can be simplified to a 2-state MDP with 1-dimensional features, such that learning is impossible even with an infinite amount of data.