Explaining Classifiers Trained on Raw Hierarchical Multiple-Instance Data
This addresses the need for interpretability in machine learning models applied to structured data like security logs, but it is incremental as it builds on existing HMIL methods.
The paper tackled the problem of explaining classifiers trained on raw hierarchical multiple-instance data, which lacks existing methods, and demonstrated that interpretable explanations can be generated with an order of magnitude speed-up and higher quality compared to an existing technique.
Learning from raw data input, thus limiting the need for feature engineering, is a component of many successful applications of machine learning methods in various domains. While many problems naturally translate into a vector representation directly usable in standard classifiers, a number of data sources have the natural form of structured data interchange formats (e.g., security logs in JSON/XML format). Existing methods, such as in Hierarchical Multiple Instance Learning (HMIL), allow learning from such data in their raw form. However, the explanation of the classifiers trained on raw structured data remains largely unexplored. By treating these models as sub-set selections problems, we demonstrate how interpretable explanations, with favourable properties, can be generated using computationally efficient algorithms. We compare to an explanation technique adopted from graph neural networks showing an order of magnitude speed-up and higher-quality explanations.