Policy Iteration for Stationary Discounted Hamilton--Jacobi--Bellman Equations: A Viscosity Approach
This work provides the first rigorous convergence analysis of PI for stationary HJB equations in the viscosity sense, addressing a fundamental ill-posedness issue for PDE-based optimal control.
Policy iteration (PI) is shown to be well-posed for stationary discounted HJB equations via a monotone semi-discrete scheme with artificial viscosity, achieving geometric convergence and a sharp O(√h) error bound, validated in 1D and 2D control problems.
We study policy iteration (PI) for deterministic infinite-horizon discounted optimal control problems, whose value function is characterized by a stationary Hamilton--Jacobi--Bellman (HJB) equation. At the PDE level, PI is fundamentally ill-posed: the improvement step requires pointwise evaluation of $\nabla V$, which is not well defined for viscosity solutions, and thus the associated nonlinear operator cannot be interpreted in a stable functional sense. We develop a monotone semi-discrete formulation for the stationary discounted setting by introducing a space-discrete scheme with artificial viscosity of order $O(h)$. This regularization restores comparison, ensures monotonicity of the discrete operator, and yields a well-defined pointwise policy improvement via discrete gradients. Our analysis reveals a convergence mechanism fundamentally different from the finite-horizon case. For each fixed mesh size $h>0$, we prove that the semi-discrete PI sequence converges monotonically and geometrically to the unique discrete solution, where the contraction is induced by the resolvent structure of the discounted operator. We further establish the sharp vanishing-viscosity estimate $\|V^h - V\|_{L^\infty} \leq C\sqrt{h}$, and derive a quantitative error decomposition that separates policy iteration error from discretization error, exhibiting a nontrivial coupling between iteration count and mesh size. Numerical experiments in nonlinear one and two-dimensional control problems confirm the theoretical predictions, including geometric convergence and the characteristic decay-then-plateau behavior of the total error.