Understanding the theoretical properties of projected Bellman equation, linear Q-learning, and approximate value iteration
This work provides incremental theoretical insights for researchers in reinforcement learning, focusing on foundational convergence properties.
The paper tackles the theoretical analysis of the projected Bellman equation and associated algorithms like linear Q-learning and approximate value iteration, establishing conditions such as SNRDD for solution existence and convergence, with insights on ε-greedy policies.
In this paper, we study the theoretical properties of the projected Bellman equation (PBE) and two algorithms to solve this equation: linear Q-learning and approximate value iteration (AVI). We consider two sufficient conditions for the existence of a solution to PBE : strictly negatively row dominating diagonal (SNRDD) assumption and a condition motivated by the convergence of AVI. The SNRDD assumption also ensures the convergence of linear Q-learning, and its relationship with the convergence of AVI is examined. Lastly, several interesting observations on the solution of PBE are provided when using $ε$-greedy policy.