Metafeatures-based Rule-Extraction for Classifiers on Behavioral and Textual Data
This addresses the challenge of providing comprehensible explanations for black-box models in domains like behavioral and textual analysis, though it appears incremental as it builds on existing rule-extraction techniques.
The paper tackles the problem of interpreting complex machine learning models on high-dimensional, sparse behavioral and textual data by developing a rule-extraction methodology based on higher-level metafeatures, resulting in explanations that better mimic the black-box model's behavior as measured by fidelity.
Machine learning models on behavioral and textual data can result in highly accurate prediction models, but are often very difficult to interpret. Rule-extraction techniques have been proposed to combine the desired predictive accuracy of complex "black-box" models with global explainability. However, rule-extraction in the context of high-dimensional, sparse data, where many features are relevant to the predictions, can be challenging, as replacing the black-box model by many rules leaves the user again with an incomprehensible explanation. To address this problem, we develop and test a rule-extraction methodology based on higher-level, less-sparse metafeatures. A key finding of our analysis is that metafeatures-based explanations are better at mimicking the behavior of the black-box prediction model, as measured by the fidelity of explanations.