Multi-CALF: A Policy Combination Approach with Statistical Guarantees
This work addresses the challenge of ensuring stability in reinforcement learning for control tasks, though it appears incremental as it builds on existing policies with added guarantees.
The paper tackles the problem of combining reinforcement learning policies to achieve both performance and stability, resulting in a method that provides formal convergence guarantees and often outperforms individual policies.
We introduce Multi-CALF, an algorithm that intelligently combines reinforcement learning policies based on their relative value improvements. Our approach integrates a standard RL policy with a theoretically-backed alternative policy, inheriting formal stability guarantees while often achieving better performance than either policy individually. We prove that our combined policy converges to a specified goal set with known probability and provide precise bounds on maximum deviation and convergence time. Empirical validation on control tasks demonstrates enhanced performance while maintaining stability guarantees.