Differentiable Weighted Finite-State Transducers
This work addresses the problem of incorporating structured prior knowledge into machine learning models, particularly for sequence tasks like handwriting and speech recognition, though it is incremental in extending existing WFST methods.
The authors introduced a differentiable framework for weighted finite-state transducers (WFSTs) to enable their dynamic use during training, allowing the integration of structured loss functions and prior knowledge into learning algorithms, with validation in handwriting and speech recognition tasks.
We introduce a framework for automatic differentiation with weighted finite-state transducers (WFSTs) allowing them to be used dynamically at training time. Through the separation of graphs from operations on graphs, this framework enables the exploration of new structured loss functions which in turn eases the encoding of prior knowledge into learning algorithms. We show how the framework can combine pruning and back-off in transition models with various sequence-level loss functions. We also show how to learn over the latent decomposition of phrases into word pieces. Finally, to demonstrate that WFSTs can be used in the interior of a deep neural network, we propose a convolutional WFST layer which maps lower-level representations to higher-level representations and can be used as a drop-in replacement for a traditional convolution. We validate these algorithms with experiments in handwriting recognition and speech recognition.