LG AI MLNov 25, 2025

Selecting Belief-State Approximations in Simulators with Latent States

arXiv:2511.20870v1

Originality Incremental advance

AI Analysis

This work addresses a fundamental but overlooked issue in simulator calibration and planning for researchers and practitioners in AI and robotics, though it is incremental as it builds on existing belief-state sampling concepts.

The paper tackles the problem of selecting approximate belief-state samplers in simulators with latent states, reducing it to a conditional distribution-selection task and developing a new algorithm with analysis under sampling-only access. It shows that latent state-based and observation-based selection formulations differ in guarantees depending on roll-out methods, with observation-based selection failing under Single-Reset but enjoying guarantees under Repeated-Reset.

State resetting is a fundamental but often overlooked capability of simulators. It supports sample-based planning by allowing resets to previously encountered simulation states, and enables calibration of simulators using real data by resetting to states observed in real-system traces. While often taken for granted, state resetting in complex simulators can be nontrivial: when the simulator comes with latent variables (states), state resetting requires sampling from the posterior over the latent state given the observable history, a.k.a. the belief state (Silver and Veness, 2010). While exact sampling is often infeasible, many approximate belief-state samplers can be constructed, raising the question of how to select among them using only sampling access to the simulator. In this paper, we show that this problem reduces to a general conditional distribution-selection task and develop a new algorithm and analysis under sampling-only access. Building on this reduction, the belief-state selection problem admits two different formulations: latent state-based selection, which directly targets the conditional distribution of the latent state, and observation-based selection, which targets the induced distribution over the observation. Interestingly, these formulations differ in how their guarantees interact with the downstream roll-out methods: perhaps surprisingly, observation-based selection may fail under the most natural roll-out method (which we call Single-Reset) but enjoys guarantees under the less conventional alternative (which we call Repeated-Reset). Together with discussion on issues such as distribution shift and the choice of sampling policies, our paper reveals a rich landscape of algorithmic choices, theoretical nuances, and open questions, in this seemingly simple problem.

View on arXiv PDF

Similar