Rectified Decision Trees: Exploring the Landscape of Interpretable and Effective Machine Learning
This work addresses the need for interpretable and effective models in real-world applications, offering an incremental improvement over standard decision trees.
The authors tackled the challenge of balancing interpretability and effectiveness in machine learning by proposing rectified decision trees (ReDT), which use knowledge distillation with soft labels to improve performance while maintaining interpretability, resulting in reduced model size and competitive accuracy.
Interpretability and effectiveness are two essential and indispensable requirements for adopting machine learning methods in reality. In this paper, we propose a knowledge distillation based decision trees extension, dubbed rectified decision trees (ReDT), to explore the possibility of fulfilling those requirements simultaneously. Specifically, we extend the splitting criteria and the ending condition of the standard decision trees, which allows training with soft labels while preserving the deterministic splitting paths. We then train the ReDT based on the soft label distilled from a well-trained teacher model through a novel jackknife-based method. Accordingly, ReDT preserves the excellent interpretable nature of the decision trees while having a relatively good performance. The effectiveness of adopting soft labels instead of hard ones is also analyzed empirically and theoretically. Surprisingly, experiments indicate that the introduction of soft labels also reduces the model size compared with the standard decision trees from the aspect of the total nodes and rules, which is an unexpected gift from the `dark knowledge' distilled from the teacher model.