End-to-End Policy Gradient Method for POMDPs and Explainable Agents
This work addresses the need for explainable agents in real-world tasks like autonomous driving, but it appears incremental as it builds on existing RL methods for POMDPs.
The paper tackled the problem of decision-making in partially observable environments by proposing a reinforcement learning algorithm that estimates hidden states through end-to-end training and visualizes them as a state-transition graph. The result showed that the algorithm can solve simple POMDP problems and the visualization makes the agent's behavior interpretable to humans, though no concrete numbers were provided.
Real-world decision-making problems are often partially observable, and many can be formulated as a Partially Observable Markov Decision Process (POMDP). When we apply reinforcement learning (RL) algorithms to the POMDP, reasonable estimation of the hidden states can help solve the problems. Furthermore, explainable decision-making is preferable, considering their application to real-world tasks such as autonomous driving cars. We proposed an RL algorithm that estimates the hidden states by end-to-end training, and visualize the estimation as a state-transition graph. Experimental results demonstrated that the proposed algorithm can solve simple POMDP problems and that the visualization makes the agent's behavior interpretable to humans.