AIMar 16

Algorithms for Deciding the Safety of States in Fully Observable Non-deterministic Problems: Technical Report

arXiv:2603.152822.4h-index: 1

AI Analysis

This work addresses safety verification for learned policies in AI planning, offering a more robust algorithm for researchers and practitioners, though it is incremental as it builds on existing pipelines.

The paper tackles the problem of verifying safety guarantees for learned action policies in sequential decision-making under non-determinism, introducing a new policy-iteration algorithm (iPI) that achieves polynomial worst-case runtime while matching the best-case performance of prior methods, with experiments showing it scales exponentially better in ill-suited problems.

Learned action policies are increasingly popular in sequential decision-making, but suffer from a lack of safety guarantees. Recent work introduced a pipeline for testing the safety of such policies under initial-state and action-outcome non-determinism. At the pipeline's core, is the problem of deciding whether a state is safe (a safe policy exists from the state) and finding faults, which are state-action pairs that transition from a safe state to an unsafe one. Their most effective algorithm for deciding safety, TarjanSafe, is effective on their benchmarks, but we show that it has exponential worst-case runtime with respect to the state space. A linear-time alternative exists, but it is slower in practice. We close this gap with a new policy-iteration algorithm iPI, that combines the best of both: it matches TarjanSafe's best-case runtime while guaranteeing a polynomial worst-case. Experiments confirm our theory and show that in problems amenable to TarjanSafe iPI has similar performance, whereas in ill-suited problems iPI scales exponentially better.

View on arXiv PDF

Similar