Probabilistic Constrained Reinforcement Learning with Formal Interpretability
This work addresses interpretability issues in reinforcement learning for sequential decision-making problems, which is an incremental improvement in making AI systems more transparent and reliable.
The authors tackled the challenge of interpreting reward functions and policies in reinforcement learning by proposing AWaVO, a method that uses probabilistic inference and formal methods to ensure interpretability with convergence guarantees, achieving a reasonable trade-off between performance and interpretability in simulations and quadrotor tasks compared to benchmarks like TRPO-IPO, PCPO, and CRPO.
Reinforcement learning can provide effective reasoning for sequential decision-making problems with variable dynamics. Such reasoning in practical implementation, however, poses a persistent challenge in interpreting the reward function and the corresponding optimal policy. Consequently, representing sequential decision-making problems as probabilistic inference can have considerable value, as, in principle, the inference offers diverse and powerful mathematical tools to infer the stochastic dynamics whilst suggesting a probabilistic interpretation of policy optimization. In this study, we propose a novel Adaptive Wasserstein Variational Optimization, namely AWaVO, to tackle these interpretability challenges. Our approach uses formal methods to achieve the interpretability for convergence guarantee, training transparency, and intrinsic decision-interpretation. To demonstrate its practicality, we showcase guaranteed interpretability with an optimal global convergence rate in simulation and in practical quadrotor tasks. In comparison with state-of-the-art benchmarks including TRPO-IPO, PCPO and CRPO, we empirically verify that AWaVO offers a reasonable trade-off between high performance and sufficient interpretability.