AIFeb 14, 2012

Suboptimality Bounds for Stochastic Shortest Path Problems

arXiv:1202.3729v111 citations
Originality Incremental advance
AI Analysis

This work addresses a limitation in reinforcement learning and dynamic programming by extending suboptimality bounds to more general stochastic shortest path scenarios, though it appears incremental as it builds on prior work in special cases.

The paper tackles the problem of computing suboptimality bounds for stochastic shortest path problems, showing that under positive transition costs, these bounds can be easily computed even without the restrictive assumption that all policies are proper, and provides preliminary results for the general case with no cost restrictions.

We consider how to use the Bellman residual of the dynamic programming operator to compute suboptimality bounds for solutions to stochastic shortest path problems. Such bounds have been previously established only in the special case that "all policies are proper," in which case the dynamic programming operator is known to be a contraction, and have been shown to be easily computable only in the more limited special case of discounting. Under the condition that transition costs are positive, we show that suboptimality bounds can be easily computed even when not all policies are proper. In the general case when there are no restrictions on transition costs, the analysis is more complex. But we present preliminary results that show such bounds are possible.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes