Decoding with Finite-State Transducers on GPUs
This work addresses the need for faster decoding in NLP applications, though it is incremental as it applies existing methods to a new hardware platform.
The paper tackled the problem of parallelizing finite-state transducer algorithms on GPUs for NLP tasks, achieving decoding speedups of up to 5.2x over serial implementations and 6093x over OpenFST.
Weighted finite automata and transducers (including hidden Markov models and conditional random fields) are widely used in natural language processing (NLP) to perform tasks such as morphological analysis, part-of-speech tagging, chunking, named entity recognition, speech recognition, and others. Parallelizing finite state algorithms on graphics processing units (GPUs) would benefit many areas of NLP. Although researchers have implemented GPU versions of basic graph algorithms, limited previous work, to our knowledge, has been done on GPU algorithms for weighted finite automata. We introduce a GPU implementation of the Viterbi and forward-backward algorithm, achieving decoding speedups of up to 5.2x over our serial implementation running on different computer architectures and 6093x over OpenFST.