Runtime Verification of Learning Properties for Reinforcement Learning Algorithms
This addresses inefficiencies in reinforcement learning for applications where physical system interactions are expensive, though it appears incremental as it builds on existing verification methods.
The paper tackles the problem of inefficient and costly trial-and-error interactions in reinforcement learning by developing runtime verification techniques to predict when learning fails to meet qualitative and timely expectations, proposing three verification properties with design steps for monitoring during operation.
Reinforcement learning (RL) algorithms interact with their environment in a trial-and-error fashion. Such interactions can be expensive, inefficient, and timely when learning on a physical system rather than in a simulation. This work develops new runtime verification techniques to predict when the learning phase has not met or will not meet qualitative and timely expectations. This paper presents three verification properties concerning the quality and timeliness of learning in RL algorithms. With each property, we propose design steps for monitoring and assessing the properties during the system's operation.