Manel Tagorti, Bruno Scherrer
We consider LSTD($λ$), the least-squares temporal-difference algorithm with eligibility traces algorithm proposed by Boyan (2002). It computes a linear approximation of the value function of a fixed policy in a large Markov Decision Process. Under a $β$-mixing assumption, we derive, for any value of $λ\in (0,1)$, a high-probability estimate of the rate of convergence of this algorithm to its limit. We deduce a high-probability bound on the error of this algorithm, that extends (and slightly improves) that derived by Lazaric et al. (2012) in the specific case where $λ=0$. In particular, our analysis sheds some light on the choice of $λ$ with respect to the quality of the chosen linear space and the number of samples, that complies with simulations.