Human Understandable Explanation Extraction for Black-box Classification Models Based on Matrix Factorization
This addresses the need for interpretability in AI systems like defect detection or diagnosis services, where understanding decision logic is crucial before deployment, though it is incremental in applying existing techniques to explanation extraction.
The paper tackles the problem of explaining black-box classification models by proposing a method based on matrix factorization to extract human-understandable, rule-like explanations, and validates it on open and industry datasets, showing reasonable results.
In recent years, a number of artificial intelligent services have been developed such as defect detection system or diagnosis system for customer services. Unfortunately, the core in these services is a black-box in which human cannot understand the underlying decision making logic, even though the inspection of the logic is crucial before launching a commercial service. Our goal in this paper is to propose an analytic method of a model explanation that is applicable to general classification models. To this end, we introduce the concept of a contribution matrix and an explanation embedding in a constraint space by using a matrix factorization. We extract a rule-like model explanation from the contribution matrix with the help of the nonnegative matrix factorization. To validate our method, the experiment results provide with open datasets as well as an industry dataset of a LTE network diagnosis and the results show our method extracts reasonable explanations.