LGMay 7

Target-Aware Data Augmentation for SAT Prediction

arXiv:2605.0693111.5
Predicted impact top 53% in LG · last 90 daysOriginality Incremental advance
AI Analysis

For researchers applying machine learning to NP-hard problems like SAT, this work addresses the data scarcity bottleneck by enabling scalable, task-aligned synthetic data generation.

The paper tackles the high cost of generating labeled training data for SAT prediction by proposing a solver-free data augmentation framework that produces correctly labeled instances aligned with target benchmarks, achieving orders-of-magnitude speedups in data generation.

Learning-based approaches to NP-hard problems have shown increasing promise, but their progress is fundamentally constrained by the high cost of generating labeled training data. In domains such as Boolean satisfiability (SAT), standard pipelines rely on solver-in-the-loop labeling, which scales poorly with problem size and limits the amount of usable supervision. This bottleneck hinders the broader goal of leveraging machine learning to capture structure in hard combinatorial problems. In this work, we propose a target-aware, solver-free data generation framework for SAT that produces correctly labeled SAT and UNSAT instances by construction, eliminating the need for expensive solver calls. Our method aligns generated instances with the structural properties of a target benchmark, making synthetic data effective for downstream learning. We further develop a linear-programming-aware graph neural network (LPGNN) architecture that incorporates constraint-violation residuals into message passing, enabling the model to exploit underlying optimization structure. Together, these contributions support a data-centric paradigm for learning on NP-hard problems, where scalable, task-aligned data generation is as critical as model design. Our approach yields orders-of-magnitude speedups in data generation, demonstrating that benchmark-aligned synthetic data can effectively augment solver-labeled datasets for GNN-based SAT prediction.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes