The Sample-Complexity of General Reinforcement Learning
This work addresses the fundamental challenge of sample efficiency in reinforcement learning for researchers and practitioners, offering theoretical guarantees that are incremental but with specific bounds.
The paper tackles the problem of sample complexity in general reinforcement learning by proposing a new algorithm that is near-optimal for all but O(N log^2 N) time-steps with high probability when the environment is from a finite class of N models, and it provides a matching lower bound for this case.
We present a new algorithm for general reinforcement learning where the true environment is known to belong to a finite class of N arbitrary models. The algorithm is shown to be near-optimal for all but O(N log^2 N) time-steps with high probability. Infinite classes are also considered where we show that compactness is a key criterion for determining the existence of uniform sample-complexity bounds. A matching lower bound is given for the finite case.