Knowledge extraction from the learning of sequences in a long short term memory (LSTM) architecture
This provides a method for making black-box RNNs more interpretable, which is an incremental advance in explainable AI for sequence learning tasks.
The authors tackled the problem of extracting interpretable knowledge from a trained LSTM that classifies sequences based on an unknown generative automaton, by clustering hidden states to build and validate a corresponding automaton, achieving successful results on artificial grammars and a real-world use-case.
We introduce a general method to extract knowledge from a recurrent neural network (Long Short Term Memory) that has learnt to detect if a given input sequence is valid or not, according to an unknown generative automaton. Based on the clustering of the hidden states, we explain how to build and validate an automaton that corresponds to the underlying (unknown) automaton, and allows to predict if a given sequence is valid or not. The method is illustrated on artificial grammars (Reber's grammar variations) as well as on a real use-case whose underlying grammar is unknown.