LG AISep 11, 2025

Vejde: A Framework for Inductive Deep Reinforcement Learning Based on Factor Graph Color Refinement

arXiv:2509.09219v1h-index: 4Trans. Mach. Learn. Res.

Originality Incremental advance

AI Analysis

This work addresses the challenge of generalization in reinforcement learning for structured state problems, offering a domain-specific solution that is incremental in nature.

The paper tackles the problem of producing inductive policy functions for decision problems with richly structured states, such as object classes and relations, by introducing Vejde, a framework combining data abstraction, graph neural networks, and reinforcement learning; results show that Vejde policies generalize to unseen test instances without significant loss in score and achieve scores close to instance-specific MLP agents.

We present and evaluate Vejde; a framework which combines data abstraction, graph neural networks and reinforcement learning to produce inductive policy functions for decision problems with richly structured states, such as object classes and relations. MDP states are represented as data bases of facts about entities, and Vejde converts each state to a bipartite graph, which is mapped to latent states through neural message passing. The factored representation of both states and actions allows Vejde agents to handle problems of varying size and structure. We tested Vejde agents on eight problem domains defined in RDDL, with ten problem instances each, where policies were trained using both supervised and reinforcement learning. To test policy generalization, we separate problem instances in two sets, one for training and the other solely for testing. Test results on unseen instances for the Vejde agents were compared to MLP agents trained on each problem instance, as well as the online planning algorithm Prost. Our results show that Vejde policies in average generalize to the test instances without a significant loss in score. Additionally, the inductive agents received scores on unseen test instances that on average were close to the instance-specific MLP agents.

View on arXiv PDF

Similar