AIFeb 14, 2024

Entropy-regularized Point-based Value Iteration

arXiv:2402.09388v11 citationsh-index: 23CDIT
Originality Incremental advance
AI Analysis

This addresses robustness issues in planning for partially observable problems, offering incremental improvements over existing methods.

The paper tackles the brittleness of model-based planners in partially observable problems under model and goal uncertainty by proposing an entropy-regularized planner that encourages policy robustness. Results show that entropy-regularized policies outperform baselines with higher expected returns under modeling errors and higher accuracy in objective inference across three domains.

Model-based planners for partially observable problems must accommodate both model uncertainty during planning and goal uncertainty during objective inference. However, model-based planners may be brittle under these types of uncertainty because they rely on an exact model and tend to commit to a single optimal behavior. Inspired by results in the model-free setting, we propose an entropy-regularized model-based planner for partially observable problems. Entropy regularization promotes policy robustness for planning and objective inference by encouraging policies to be no more committed to a single action than necessary. We evaluate the robustness and objective inference performance of entropy-regularized policies in three problem domains. Our results show that entropy-regularized policies outperform non-entropy-regularized baselines in terms of higher expected returns under modeling errors and higher accuracy during objective inference.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes