Self-Attentional Models for Lattice Inputs
This work addresses computational inefficiency in natural language processing tasks that use lattices, such as speech recognition, by introducing a faster self-attentional approach, though it is incremental as it adapts an existing paradigm to a specific input type.
The paper tackled the problem of slow computation speeds in neural lattice models by extending self-attention to handle lattice inputs, resulting in a model that outperformed all baselines in a speech translation task and was much faster during training and inference.
Lattices are an efficient and effective method to encode ambiguity of upstream systems in natural language processing tasks, for example to compactly capture multiple speech recognition hypotheses, or to represent multiple linguistic analyses. Previous work has extended recurrent neural networks to model lattice inputs and achieved improvements in various tasks, but these models suffer from very slow computation speeds. This paper extends the recently proposed paradigm of self-attention to handle lattice inputs. Self-attention is a sequence modeling technique that relates inputs to one another by computing pairwise similarities and has gained popularity for both its strong results and its computational efficiency. To extend such models to handle lattices, we introduce probabilistic reachability masks that incorporate lattice structure into the model and support lattice scores if available. We also propose a method for adapting positional embeddings to lattice structures. We apply the proposed model to a speech translation task and find that it outperforms all examined baselines while being much faster to compute than previous neural lattice models during both training and inference.