AI LGFeb 11

Neuro-symbolic Action Masking for Deep Reinforcement Learning

arXiv:2602.10598v11 citationsh-index: 7

Originality Highly original

AI Analysis

This addresses the issue of constraint violations in DRL for domains with specific constraints, representing an incremental advance by integrating symbolic reasoning with deep policy optimization.

The paper tackles the problem of deep reinforcement learning exploring infeasible actions by proposing Neuro-symbolic Action Masking (NSAM), which automatically learns symbolic models and action masks to constrain actions, resulting in significantly improved sample efficiency and reduced constraint violations in multiple domains.

Deep reinforcement learning (DRL) may explore infeasible actions during training and execution. Existing approaches assume a symbol grounding function that maps high-dimensional states to consistent symbolic representations and a manually specified action masking techniques to constrain actions. In this paper, we propose Neuro-symbolic Action Masking (NSAM), a novel framework that automatically learn symbolic models, which are consistent with given domain constraints of high-dimensional states, in a minimally supervised manner during the DRL process. Based on the learned symbolic model of states, NSAM learns action masks that rules out infeasible actions. NSAM enables end-to-end integration of symbolic reasoning and deep policy optimization, where improvements in symbolic grounding and policy learning mutually reinforce each other. We evaluate NSAM on multiple domains with constraints, and experimental results demonstrate that NSAM significantly improves sample efficiency of DRL agent while substantially reducing constraint violations.

View on arXiv PDF

Similar