AIMay 27, 2022

GALOIS: Boosting Deep Reinforcement Learning via Generalizable Logic Synthesis

Yushi Cao, Zhiming Li, Tianpei Yang, Hao Zhang, Yan Zheng, Yi Li, Jianye Hao, Yang Liu

arXiv:2205.13728v119.423 citationsh-index: 24

Originality Incremental advance

AI Analysis

This addresses the need for more interpretable and generalizable AI in complex decision-making, though it appears incremental as it combines existing paradigms.

The paper tackles the problem of deep reinforcement learning lacking high-order intelligence like logic deduction, by proposing GALOIS, a framework that synthesizes hierarchical and strict cause-effect logic programs, resulting in superior asymptotic performance, generalizability, and knowledge reusability across various decision-making tasks.

Despite achieving superior performance in human-level control problems, unlike humans, deep reinforcement learning (DRL) lacks high-order intelligence (e.g., logic deduction and reuse), thus it behaves ineffectively than humans regarding learning and generalization in complex problems. Previous works attempt to directly synthesize a white-box logic program as the DRL policy, manifesting logic-driven behaviors. However, most synthesis methods are built on imperative or declarative programming, and each has a distinct limitation, respectively. The former ignores the cause-effect logic during synthesis, resulting in low generalizability across tasks. The latter is strictly proof-based, thus failing to synthesize programs with complex hierarchical logic. In this paper, we combine the above two paradigms together and propose a novel Generalizable Logic Synthesis (GALOIS) framework to synthesize hierarchical and strict cause-effect logic programs. GALOIS leverages the program sketch and defines a new sketch-based hybrid program language for guiding the synthesis. Based on that, GALOIS proposes a sketch-based program synthesis method to automatically generate white-box programs with generalizable and interpretable cause-effect logic. Extensive evaluations on various decision-making tasks with complex logic demonstrate the superiority of GALOIS over mainstream baselines regarding the asymptotic performance, generalizability, and great knowledge reusability across different environments.

View on arXiv PDF

Similar