AILGMar 20, 2025

Reinforcement Learning-based Heuristics to Guide Domain-Independent Dynamic Programming

arXiv:2503.16371v22 citationsh-index: 8CPAIOR
Originality Incremental advance
AI Analysis

This work addresses the challenge of improving search efficiency in DIDP for combinatorial optimization, offering a novel integration of RL that yields practical gains but is incremental in combining existing RL methods with DIDP.

The paper tackles the problem of guiding Domain-Independent Dynamic Programming (DIDP) for combinatorial optimization by proposing reinforcement learning-based heuristics, resulting in significant performance improvements over standard DIDP and problem-specific greedy heuristics in terms of node expansions and run-time on benchmark domains.

Domain-Independent Dynamic Programming (DIDP) is a state-space search paradigm based on dynamic programming for combinatorial optimization. In its current implementation, DIDP guides the search using user-defined dual bounds. Reinforcement learning (RL) is increasingly being applied to combinatorial optimization problems and shares several key structures with DP, being represented by the Bellman equation and state-based transition systems. We propose using reinforcement learning to obtain a heuristic function to guide the search in DIDP. We develop two RL-based guidance approaches: value-based guidance using Deep Q-Networks and policy-based guidance using Proximal Policy Optimization. Our experiments indicate that RL-based guidance significantly outperforms standard DIDP and problem-specific greedy heuristics with the same number of node expansions. Further, despite longer node evaluation times, RL guidance achieves better run-time performance than standard DIDP on three of four benchmark domains.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes