Weighted Automata Extraction from Recurrent Neural Networks via Regression on State Spaces
This work addresses the interpretability of RNNs for researchers in formal methods and machine learning, but it is incremental as it builds on existing algorithms for automata extraction.
The authors tackled the problem of extracting weighted finite automata (WFAs) from recurrent neural networks (RNNs) by using regression methods on state spaces to prioritize counterexample candidates, achieving improved accuracy and efficiency in the extracted WFAs as demonstrated experimentally.
We present a method to extract a weighted finite automaton (WFA) from a recurrent neural network (RNN). Our algorithm is based on the WFA learning algorithm by Balle and Mohri, which is in turn an extension of Angluin's classic \lstar algorithm. Our technical novelty is in the use of \emph{regression} methods for the so-called equivalence queries, thus exploiting the internal state space of an RNN to prioritize counterexample candidates. This way we achieve a quantitative/weighted extension of the recent work by Weiss, Goldberg and Yahav that extracts DFAs. We experimentally evaluate the accuracy, expressivity and efficiency of the extracted WFAs.