Mo' States Mo' Problems: Emergency Stop Mechanisms from Observation
This addresses efficiency issues in reinforcement learning for practitioners, though it is incremental as it builds on existing methods.
The paper tackles the problem of high sample complexity in reinforcement learning by introducing emergency stops (e-stops) to reduce exploration, achieving order-of-magnitude speedups in empirical results.
In many environments, only a relatively small subset of the complete state space is necessary in order to accomplish a given task. We develop a simple technique using emergency stops (e-stops) to exploit this phenomenon. Using e-stops significantly improves sample complexity by reducing the amount of required exploration, while retaining a performance bound that efficiently trades off the rate of convergence with a small asymptotic sub-optimality gap. We analyze the regret behavior of e-stops and present empirical results in discrete and continuous settings demonstrating that our reset mechanism can provide order-of-magnitude speedups on top of existing reinforcement learning methods.