Statistical Linear Estimation with Penalized Estimators: an Application to Reinforcement Learning
This work addresses value function estimation in reinforcement learning, offering incremental improvements in regularization methods for noisy linear systems.
The paper tackles the problem of statistical linear inverse problems with noisy coefficients, proposing data-dependent regularization parameter choices for penalized estimators without data-splitting. It applies these results to reinforcement learning, deriving new insights and bounds for linear value function estimation.
Motivated by value function estimation in reinforcement learning, we study statistical linear inverse problems, i.e., problems where the coefficients of a linear system to be solved are observed in noise. We consider penalized estimators, where performance is evaluated using a matrix-weighted two-norm of the defect of the estimator measured with respect to the true, unknown coefficients. Two objective functions are considered depending whether the error of the defect measured with respect to the noisy coefficients is squared or unsquared. We propose simple, yet novel and theoretically well-founded data-dependent choices for the regularization parameters for both cases that avoid data-splitting. A distinguishing feature of our analysis is that we derive deterministic error bounds in terms of the error of the coefficients, thus allowing the complete separation of the analysis of the stochastic properties of these errors. We show that our results lead to new insights and bounds for linear value function estimation in reinforcement learning.