Guaranteeing Control Requirements via Reward Shaping in Reinforcement Learning
This addresses the need for reliable control in reinforcement learning applications, but it is incremental as it builds on existing reward shaping methods.
The paper tackles the problem of guaranteeing that reinforcement learning policies meet specific control requirements like settling time and steady-state error before deployment, and presents a reward shaping procedure that ensures optimal policies align with these requirements and allows assessment of any policy's compliance, validated through experiments in OpenAI Gym environments with consistent efficacy.
In addressing control problems such as regulation and tracking through reinforcement learning, it is often required to guarantee that the acquired policy meets essential performance and stability criteria such as a desired settling time and steady-state error prior to deployment. Motivated by this necessity, we present a set of results and a systematic reward shaping procedure that (i) ensures the optimal policy generates trajectories that align with specified control requirements and (ii) allows to assess whether any given policy satisfies them. We validate our approach through comprehensive numerical experiments conducted in two representative environments from OpenAI Gym: the Inverted Pendulum swing-up problem and the Lunar Lander. Utilizing both tabular and deep reinforcement learning methods, our experiments consistently affirm the efficacy of our proposed framework, highlighting its effectiveness in ensuring policy adherence to the prescribed control requirements.