LGOCSTMLDec 9, 2020

Optimal oracle inequalities for solving projected fixed-point equations

arXiv:2012.05299v116 citations
AI Analysis

This research provides a theoretical characterization of the optimal error for solving projected fixed-point equations, which is significant for researchers developing and analyzing stochastic approximation schemes in areas like reinforcement learning and computational methods for differential/integral equations.

This paper investigates methods for solving linear fixed-point equations in Hilbert spaces using random observations and searching over a low-dimensional subspace. It provides instance-dependent upper bounds on the mean-squared error for a linear stochastic approximation scheme with Polyak-Ruppert averaging, showing the bound consists of an approximation error and a statistical error term. The authors also establish lower bounds using information theory, demonstrating that these terms are unimprovable in an instance-dependent sense, and apply these findings to characterize the optimality of temporal difference learning methods.

Linear fixed point equations in Hilbert spaces arise in a variety of settings, including reinforcement learning, and computational methods for solving differential and integral equations. We study methods that use a collection of random observations to compute approximate solutions by searching over a known low-dimensional subspace of the Hilbert space. First, we prove an instance-dependent upper bound on the mean-squared error for a linear stochastic approximation scheme that exploits Polyak--Ruppert averaging. This bound consists of two terms: an approximation error term with an instance-dependent approximation factor, and a statistical error term that captures the instance-specific complexity of the noise when projected onto the low-dimensional subspace. Using information theoretic methods, we also establish lower bounds showing that both of these terms cannot be improved, again in an instance-dependent sense. A concrete consequence of our characterization is that the optimal approximation factor in this problem can be much larger than a universal constant. We show how our results precisely characterize the error of a class of temporal difference learning methods for the policy evaluation problem with linear function approximation, establishing their optimality.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes