LGAug 16, 2022

FOLD-SE: An Efficient Rule-based Machine Learning Algorithm with Scalable Explainability

arXiv:2208.07912v214.619 citationsh-index: 9

Originality Incremental advance

AI Analysis

It addresses the need for scalable explainability in machine learning for users who require interpretable models, building incrementally on prior work.

The paper tackles the problem of creating an explainable machine learning algorithm for classification on tabular data, resulting in FOLD-SE, which maintains good accuracy with a small number of rules and literals, is an order of magnitude faster than XGBoost and MLP, and outperforms other rule-learning algorithms in efficiency and scalability.

We present FOLD-SE, an efficient, explainable machine learning algorithm for classification tasks given tabular data containing numerical and categorical values. FOLD-SE generates a set of default rules-essentially a stratified normal logic program-as an (explainable) trained model. Explainability provided by FOLD-SE is scalable, meaning that regardless of the size of the dataset, the number of learned rules and learned literals stay quite small while good accuracy in classification is maintained. A model with smaller number of rules and literals is easier to understand for human beings. FOLD-SE is competitive with state-of-the-art machine learning algorithms such as XGBoost and Multi-Layer Perceptrons (MLP) wrt accuracy of prediction. However, unlike XGBoost and MLP, the FOLD-SE algorithm is explainable. The FOLD-SE algorithm builds upon our earlier work on developing the explainable FOLD-R++ machine learning algorithm for binary classification and inherits all of its positive features. Thus, pre-processing of the dataset, using techniques such as one-hot encoding, is not needed. Like FOLD-R++, FOLD-SE uses prefix sum to speed up computations resulting in FOLD-SE being an order of magnitude faster than XGBoost and MLP in execution speed. The FOLD-SE algorithm outperforms FOLD-R++ as well as other rule-learning algorithms such as RIPPER in efficiency, performance and scalability, especially for large datasets. A major reason for scalable explainability of FOLD-SE is the use of a literal selection heuristics based on Gini Impurity, as opposed to Information Gain used in FOLD-R++. A multi-category classification version of FOLD-SE is also presented.

View on arXiv PDF

Similar