Clustering-Based Approaches for Symbolic Knowledge Extraction
This work addresses the need for interpretable models in critical applications, offering an incremental improvement over existing symbolic knowledge extraction methods.
The paper tackles the problem of extracting symbolic knowledge from opaque machine learning models by proposing a clustering-based approach to improve partitioning before rule extraction, achieving better performance across diverse datasets.
Opaque models belonging to the machine learning world are ever more exploited in the most different application areas. These models, acting as black boxes (BB) from the human perspective, cannot be entirely trusted if the application is critical unless there exists a method to extract symbolic and human-readable knowledge out of them. In this paper we analyse a recurrent design adopted by symbolic knowledge extractors for BB regressors - that is, the creation of rules associated with hypercubic input space regions. We argue that this kind of partitioning may lead to suboptimal solutions when the data set at hand is high-dimensional or does not satisfy symmetric constraints. We then propose a (deep) clustering-based approach to be performed before symbolic knowledge extraction to achieve better performance with data sets of any kind.