Steady-State Error Compensation for Reinforcement Learning with Quadratic Rewards
This work addresses a specific issue in RL for control systems like autonomous driving, offering an incremental improvement to reward function design.
The paper tackles the problem of significant steady-state errors in Reinforcement Learning when using quadratic reward functions, proposing an approach that adds an integral term to these functions. Through experiments on Adaptive Cruise Control and lane change models, the method effectively reduces steady-state errors without causing major spikes in system states.
The selection of a reward function in Reinforcement Learning (RL) has garnered significant attention because of its impact on system performance. Issues of significant steady-state errors often manifest when quadratic reward functions are employed. Although absolute-value-type reward functions alleviate this problem, they tend to induce substantial fluctuations in specific system states, leading to abrupt changes. In response to this challenge, this study proposes an approach that introduces an integral term. By integrating this integral term into quadratic-type reward functions, the RL algorithm is adeptly tuned, augmenting the system's consideration of reward history, and consequently alleviates concerns related to steady-state errors. Through experiments and performance evaluations on the Adaptive Cruise Control (ACC) and lane change models, we validate that the proposed method effectively diminishes steady-state errors and does not cause significant spikes in some system states.