Improved cross-validation for classifiers that make algorithmic choices to minimise runtime without compromising output correctness
This work addresses runtime optimization in computer algebra systems, an incremental improvement for domain-specific applications.
The paper tackles the problem of selecting variable orderings in cylindrical algebraic decomposition to minimize runtime without compromising correctness, by modifying cross-validation algorithms to better distinguish between non-optimal orderings, leading to improved results in experiments with more variables.
Our topic is the use of machine learning to improve software by making choices which do not compromise the correctness of the output, but do affect the time taken to produce such output. We are particularly concerned with computer algebra systems (CASs), and in particular, our experiments are for selecting the variable ordering to use when performing a cylindrical algebraic decomposition of $n$-dimensional real space with respect to the signs of a set of polynomials. In our prior work we explored the different ML models that could be used, and how to identify suitable features of the input polynomials. In the present paper we both repeat our prior experiments on problems which have more variables (and thus exponentially more possible orderings), and examine the metric which our ML classifiers targets. The natural metric is computational runtime, with classifiers trained to pick the ordering which minimises this. However, this leads to the situation were models do not distinguish between any of the non-optimal orderings, whose runtimes may still vary dramatically. In this paper we investigate a modification to the cross-validation algorithms of the classifiers so that they do distinguish these cases, leading to improved results.