LGMay 21, 2025

Learning from Algorithm Feedback: One-Shot SAT Solver Guidance with GNNs

arXiv:2505.16053v13 citationsh-index: 3
Originality Highly original
AI Analysis

This work addresses the challenge of hand-crafted heuristics in SAT solvers, a foundational area in computer science, by offering a data-driven approach that is incremental but shows strong gains.

The paper tackles the problem of improving Boolean Satisfiability (SAT) solver performance by introducing Reinforcement Learning from Algorithm Feedback (RLAF) to guide branching heuristics with Graph Neural Networks (GNNs), resulting in more than a 2x speedup in mean solve times across diverse SAT problems.

Boolean Satisfiability (SAT) solvers are foundational to computer science, yet their performance typically hinges on hand-crafted heuristics. This work introduces Reinforcement Learning from Algorithm Feedback (RLAF) as a paradigm for learning to guide SAT solver branching heuristics with Graph Neural Networks (GNNs). Central to our approach is a novel and generic mechanism for injecting inferred variable weights and polarities into the branching heuristics of existing SAT solvers. In a single forward pass, a GNN assigns these parameters to all variables. Casting this one-shot guidance as a reinforcement learning problem lets us train the GNN with off-the-shelf policy-gradient methods, such as GRPO, directly using the solver's computational cost as the sole reward signal. Extensive evaluations demonstrate that RLAF-trained policies significantly reduce the mean solve times of different base solvers across diverse SAT problem distributions, achieving more than a 2x speedup in some cases, while generalizing effectively to larger and harder problems after training. Notably, these policies consistently outperform expert-supervised approaches based on learning handcrafted weighting heuristics, offering a promising path towards data-driven heuristic design in combinatorial optimization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes