Explaining Black Boxes on Sequential Data using Weighted Automata
This addresses the need for interpretability in machine learning, particularly for sequential data, though it is incremental as it builds on existing spectral methods for automata extraction.
The paper tackles the problem of globally interpreting black box models on sequential data by proposing a spectral algorithm to extract weighted automata solely through querying the black box, with experiments on 48 synthetic and 2 real datasets showing high-quality approximations.
Understanding how a learned black box works is of crucial interest for the future of Machine Learning. In this paper, we pioneer the question of the global interpretability of learned black box models that assign numerical values to symbolic sequential data. To tackle that task, we propose a spectral algorithm for the extraction of weighted automata (WA) from such black boxes. This algorithm does not require the access to a dataset or to the inner representation of the black box: the inferred model can be obtained solely by querying the black box, feeding it with inputs and analyzing its outputs. Experiments using Recurrent Neural Networks (RNN) trained on a wide collection of 48 synthetic datasets and 2 real datasets show that the obtained approximation is of great quality.