New Sufficient Conditions for Lower Bounding the Optimal Policy of a POMDP using Lehmann Precision
For researchers working on POMDPs, this provides a more general condition for myopic policy optimality, though the improvement is incremental over existing Blackwell dominance.
The paper introduces new sufficient conditions (Lehmann precision and copositive dominance) that allow the optimal policy of a POMDP to be lower bounded by a myopic policy, fixing issues in prior work. Numerical examples show Lehmann precision holds where Blackwell dominance does not, demonstrating its utility in controlled sensing.
This paper provides new sufficient conditions so that the optimal policy of a partially observed Markov decision process (POMDP) can be lower bounded by a myopic policy. The two new proposed conditions, namely, Lehmann precision and copositive dominance, completely fix the problems with two crucial assumptions in the well known papers of Lovejoy 1987 and Rieder 1991. For controlled sensing POMDPs, Lehmann precision exploits both convexity and monotonicity of the value function, whereas the classical Blackwell dominance only exploits convexity. Numerical examples are presented where Lehmann precision holds but Blackwell dominance does not hold, thereby illustrating the usefulness of the main result in controlled sensing applications.