Structured Reordering for Modeling Latent Alignments in Sequence Transduction
This addresses the issue of poor systematic generalization in neural models for sequence transduction, which is a problem for applications requiring robustness to distribution shifts, though it is incremental relative to traditional grammar-based approaches.
The paper tackled the problem of neural sequence-to-sequence models failing to generalize systematically to novel combinations of concepts, by modeling segment-to-segment alignments as discrete structured latent variables within a neural seq2seq model. The result was a model that exhibited better systematic generalization than standard models on synthetic problems and NLP tasks like semantic parsing and machine translation.
Despite success in many domains, neural models struggle in settings where train and test examples are drawn from different distributions. In particular, in contrast to humans, conventional sequence-to-sequence (seq2seq) models fail to generalize systematically, i.e., interpret sentences representing novel combinations of concepts (e.g., text segments) seen in training. Traditional grammar formalisms excel in such settings by implicitly encoding alignments between input and output segments, but are hard to scale and maintain. Instead of engineering a grammar, we directly model segment-to-segment alignments as discrete structured latent variables within a neural seq2seq model. To efficiently explore the large space of alignments, we introduce a reorder-first align-later framework whose central component is a neural reordering module producing {\it separable} permutations. We present an efficient dynamic programming algorithm performing exact marginal inference of separable permutations, and, thus, enabling end-to-end differentiable training of our model. The resulting seq2seq model exhibits better systematic generalization than standard models on synthetic problems and NLP tasks (i.e., semantic parsing and machine translation).