AIFeb 14, 2024

Entropy-regularized Point-based Value Iteration

Harrison Delecki, Marcell Vazquez-Chanlatte, Esen Yel, Kyle Wray, Tomer Arnon, Stefan Witwicki, Mykel J. Kochenderfer

arXiv:2402.09388v14.21 citationsh-index: 23Has CodeCDIT

Originality Incremental advance

AI Analysis

This addresses robustness issues in planning for partially observable problems, offering incremental improvements over existing methods.

The paper tackles the brittleness of model-based planners in partially observable problems under model and goal uncertainty by proposing an entropy-regularized planner that encourages policy robustness. Results show that entropy-regularized policies outperform baselines with higher expected returns under modeling errors and higher accuracy in objective inference across three domains.

Model-based planners for partially observable problems must accommodate both model uncertainty during planning and goal uncertainty during objective inference. However, model-based planners may be brittle under these types of uncertainty because they rely on an exact model and tend to commit to a single optimal behavior. Inspired by results in the model-free setting, we propose an entropy-regularized model-based planner for partially observable problems. Entropy regularization promotes policy robustness for planning and objective inference by encouraging policies to be no more committed to a single action than necessary. We evaluate the robustness and objective inference performance of entropy-regularized policies in three problem domains. Our results show that entropy-regularized policies outperform non-entropy-regularized baselines in terms of higher expected returns under modeling errors and higher accuracy during objective inference.

View on arXiv PDF Code

Similar