Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples
This addresses the interpretability problem for AI practitioners working with RNNs, though it is an incremental improvement on existing extraction methods.
The researchers tackled the problem of understanding the internal state dynamics of trained recurrent neural networks by developing an algorithm that extracts deterministic finite automata using exact learning and abstraction, achieving efficient extraction of accurate automata even for large state vectors.
We present a novel algorithm that uses exact learning and abstraction to extract a deterministic finite automaton describing the state dynamics of a given trained RNN. We do this using Angluin's L* algorithm as a learner and the trained RNN as an oracle. Our technique efficiently extracts accurate automata from trained RNNs, even when the state vectors are large and require fine differentiation.