QuantifyML: How Good is my Machine Learning Model?
This addresses the issue of unreliable model evaluation for ML practitioners, though it is incremental as it builds on existing formal verification techniques.
The paper tackles the problem of misleading accuracy metrics in machine learning by introducing QuantifyML, a method that translates trained models into C programs and uses model checking to precisely quantify model generalization, learnability, and safety, enabling comparisons between algorithms like decision trees and neural networks.
The efficacy of machine learning models is typically determined by computing their accuracy on test data sets. However, this may often be misleading, since the test data may not be representative of the problem that is being studied. With QuantifyML we aim to precisely quantify the extent to which machine learning models have learned and generalized from the given data. Given a trained model, QuantifyML translates it into a C program and feeds it to the CBMC model checker to produce a formula in Conjunctive Normal Form (CNF). The formula is analyzed with off-the-shelf model counters to obtain precise counts with respect to different model behavior. QuantifyML enables i) evaluating learnability by comparing the counts for the outputs to ground truth, expressed as logical predicates, ii) comparing the performance of models built with different machine learning algorithms (decision-trees vs. neural networks), and iii) quantifying the safety and robustness of models.