Towards a General Framework for Predicting and Explaining the Hardness of Graph-based Combinatorial Optimization Problems using Machine Learning and Association Rule Mining
This addresses the problem of efficiently assessing and understanding problem difficulty for researchers and practitioners in combinatorial optimization, though it is incremental as it builds on existing machine learning and association rule mining techniques.
The study tackled predicting and explaining the computational hardness of graph-based combinatorial optimization problems by introducing the GCO-HPIF framework, which achieved a weighted F1 score of 0.9921 and an ROC-AUC of 0.9083 using only three graph features, with the best regression model for computation times achieving a percentage RMSE of 5.12 and an R2 of 0.991.
This study introduces GCO-HPIF, a general machine-learning-based framework to predict and explain the computational hardness of combinatorial optimization problems that can be represented on graphs. The framework consists of two stages. In the first stage, a dataset is created comprising problem-agnostic graph features and hardness classifications of problem instances. Machine-learning-based classification algorithms are trained to map graph features to hardness categories. In the second stage, the framework explains the predictions using an association rule mining algorithm. Additionally, machine-learning-based regression models are trained to predict algorithmic computation times. The GCO-HPIF framework was applied to a dataset of 3287 maximum clique problem instances compiled from the COLLAB, IMDB, and TWITTER graph datasets using five state-of-the-art algorithms, namely three exact branch-and-bound-based algorithms (Gurobi, CliSAT, and MOMC) and two graph-neural-network-based algorithms (EGN and HGS). The framework demonstrated excellent performance in predicting instance hardness, achieving a weighted F1 score of 0.9921, a minority-class F1 score of 0.878, and an ROC-AUC score of 0.9083 using only three graph features. The best association rule found by the FP-Growth algorithm for explaining the hardness predictions had a support of 0.8829 for hard instances and an overall accuracy of 87.64 percent, underscoring the framework's usefulness for both prediction and explanation. Furthermore, the best-performing regression model for predicting computation times achieved a percentage RMSE of 5.12 and an R2 value of 0.991.