On the continuity and smoothness of the value function in reinforcement learning and optimal control
This work addresses foundational theoretical issues for researchers in reinforcement learning and optimal control, but it is incremental as it builds on existing mathematical analysis.
The paper tackles the problem of understanding the continuity and smoothness of the value function in reinforcement learning and optimal control by providing upper bounds on its modulus of continuity and showing Hölder continuity under weak assumptions, with results including that non-differentiable functions can be made differentiable by perturbing the system.
The value function plays a crucial role as a measure for the cumulative future reward an agent receives in both reinforcement learning and optimal control. It is therefore of interest to study how similar the values of neighboring states are, i.e., to investigate the continuity of the value function. We do so by providing and verifying upper bounds on the value function's modulus of continuity. Additionally, we show that the value function is always Hölder continuous under relatively weak assumptions on the underlying system and that non-differentiable value functions can be made differentiable by slightly "disturbing" the system.