Revisiting Variable Ordering for Real Quantifier Elimination using Machine Learning
This work addresses data bias issues for researchers using CAD in formal verification, but it is incremental as it builds on prior machine learning approaches.
The paper tackled bias in training data for machine learning models that select variable orderings in Cylindrical Algebraic Decomposition (CAD), a technique for verifying cyber-physical systems, by creating a new dataset of over 41K challenges to remove bias and testing model generalizability.
Cylindrical Algebraic Decomposition (CAD) is a key proof technique for formal verification of cyber-physical systems. CAD is computationally expensive, with worst-case doubly-exponential complexity. Selecting an optimal variable ordering is paramount to efficient use of CAD. Prior work has demonstrated that machine learning can be useful in determining efficient variable orderings. Much of this work has been driven by CAD problems extracted from applications of the MetiTarski theorem prover. In this paper, we revisit this prior work and consider issues of bias in existing training and test data. We observe that the classical MetiTarski benchmarks are heavily biased towards particular variable orderings. To address this, we apply symmetries to create a new dataset containing more than 41K MetiTarski challenges designed to remove bias. Furthermore, we evaluate issues of information leakage, and test the generalizability of our models on the new dataset.