AINov 11, 2025

Neural Value Iteration

Yang You, Ufuk Çakır, Alex Schutz, Robert Skilton, Nick Hawes

arXiv:2511.08825v13.3h-index: 11

Originality Incremental advance

AI Analysis

This addresses scalability issues in POMDP planning for robotics and AI applications, though it builds incrementally on point-based value iteration.

The authors tackled the computational intractability of solving large-scale POMDPs by representing the value function as neural networks instead of α-vectors, achieving near-optimal solutions in problems where existing solvers fail.

The value function of a POMDP exhibits the piecewise-linear-convex (PWLC) property and can be represented as a finite set of hyperplanes, known as $α$-vectors. Most state-of-the-art POMDP solvers (offline planners) follow the point-based value iteration scheme, which performs Bellman backups on $α$-vectors at reachable belief points until convergence. However, since each $α$-vector is $|S|$-dimensional, these methods quickly become intractable for large-scale problems due to the prohibitive computational cost of Bellman backups. In this work, we demonstrate that the PWLC property allows a POMDP's value function to be alternatively represented as a finite set of neural networks. This insight enables a novel POMDP planning algorithm called \emph{Neural Value Iteration}, which combines the generalization capability of neural networks with the classical value iteration framework. Our approach achieves near-optimal solutions even in extremely large POMDPs that are intractable for existing offline solvers.

View on arXiv PDF

Similar