AIJul 29, 2025

Unrolling Dynamic Programming via Graph Filters

Sergio Rozada, Samuel Rey, Gonzalo Mateos, Antonio G. Marques

arXiv:2507.21705v11 citationsh-index: 9

Originality Incremental advance

AI Analysis

This work addresses efficiency issues in dynamic programming for engineering fields, offering a novel method that is incremental in its approach.

The authors tackled the computational expense of dynamic programming in large state-action spaces by proposing BellNet, a learnable parametric model that unrolls policy iterations and re-parameterizes it as nonlinear graph filters, achieving effective policy approximation in fewer iterations than classical methods.

Dynamic programming (DP) is a fundamental tool used across many engineering fields. The main goal of DP is to solve Bellman's optimality equations for a given Markov decision process (MDP). Standard methods like policy iteration exploit the fixed-point nature of these equations to solve them iteratively. However, these algorithms can be computationally expensive when the state-action space is large or when the problem involves long-term dependencies. Here we propose a new approach that unrolls and truncates policy iterations into a learnable parametric model dubbed BellNet, which we train to minimize the so-termed Bellman error from random value function initializations. Viewing the transition probability matrix of the MDP as the adjacency of a weighted directed graph, we draw insights from graph signal processing to interpret (and compactly re-parameterize) BellNet as a cascade of nonlinear graph filters. This fresh look facilitates a concise, transferable, and unifying representation of policy and value iteration, with an explicit handle on complexity during inference. Preliminary experiments conducted in a grid-like environment demonstrate that BellNet can effectively approximate optimal policies in a fraction of the iterations required by classical methods.

View on arXiv PDF

Similar