AIROApr 7, 2024

On the Uniqueness of Solution for the Bellman Equation of LTL Objectives

arXiv:2404.05074v15 citationsh-index: 6L4DC
Originality Incremental advance
AI Analysis

This work identifies and resolves a theoretical gap in reinforcement learning for LTL planning, which is incremental but crucial for accurate evaluation in domain-specific applications like robotics and verification.

The paper addresses the non-uniqueness of solutions to the Bellman equation in surrogate reward methods for linear temporal logic (LTL) objectives, showing that setting one discount factor to one can lead to multiple solutions and inaccurate return estimates. It proposes and proves a condition requiring solutions in rejecting bottom strongly connected components to be zero to ensure uniqueness.

Surrogate rewards for linear temporal logic (LTL) objectives are commonly utilized in planning problems for LTL objectives. In a widely-adopted surrogate reward approach, two discount factors are used to ensure that the expected return approximates the satisfaction probability of the LTL objective. The expected return then can be estimated by methods using the Bellman updates such as reinforcement learning. However, the uniqueness of the solution to the Bellman equation with two discount factors has not been explicitly discussed. We demonstrate with an example that when one of the discount factors is set to one, as allowed in many previous works, the Bellman equation may have multiple solutions, leading to inaccurate evaluation of the expected return. We then propose a condition for the Bellman equation to have the expected return as the unique solution, requiring the solutions for states inside a rejecting bottom strongly connected component (BSCC) to be 0. We prove this condition is sufficient by showing that the solutions for the states with discounting can be separated from those for the states without discounting under this condition

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes