Bi-directional Value Learning for Risk-aware Planning Under Uncertainty: Extended Version
This work addresses risk-aware planning for autonomous systems like rovers, but it appears incremental as it builds on existing POMDP frameworks with a novel bi-directional approach.
The paper tackled the problem of decision-making under uncertainty in autonomous systems by proposing a bi-directional value learning method for risk-aware planning in partially observable settings, demonstrating its capabilities in evaluating long-range risk and safety for continuous-space rover navigation problems with long planning horizons.
Decision-making under uncertainty is a crucial ability for autonomous systems. In its most general form, this problem can be formulated as a Partially Observable Markov Decision Process (POMDP). The solution policy of a POMDP can be implicitly encoded as a value function. In partially observable settings, the value function is typically learned via forward simulation of the system evolution. Focusing on accurate and long-range risk assessment, we propose a novel method, where the value function is learned in different phases via a bi-directional search in belief space. A backward value learning process provides a long-range and risk-aware base policy. A forward value learning process ensures local optimality and updates the policy via forward simulations. We consider a class of scalable and continuous-space rover navigation problems (RNP) to assess the safety, scalability, and optimality of the proposed algorithm. The results demonstrate the capabilities of the proposed algorithm in evaluating long-range risk/safety of the planner while addressing continuous problems with long planning horizons.