Robust Optimal Classification Trees Against Adversarial Examples
This addresses the need for explainable and robust models in security-critical domains, though it is incremental by building on existing adversarial learning frameworks.
The paper tackles the problem of adversarial vulnerability in decision trees by proposing ROCT, a method to train optimally robust trees against specified attacks, achieving state-of-the-art scores in experiments.
Decision trees are a popular choice of explainable model, but just like neural networks, they suffer from adversarial examples. Existing algorithms for fitting decision trees robust against adversarial examples are greedy heuristics and lack approximation guarantees. In this paper we propose ROCT, a collection of methods to train decision trees that are optimally robust against user-specified attack models. We show that the min-max optimization problem that arises in adversarial learning can be solved using a single minimization formulation for decision trees with 0-1 loss. We propose such formulations in Mixed-Integer Linear Programming and Maximum Satisfiability, which widely available solvers can optimize. We also present a method that determines the upper bound on adversarial accuracy for any model using bipartite matching. Our experimental results demonstrate that the existing heuristics achieve close to optimal scores while ROCT achieves state-of-the-art scores.