CLOct 9, 2021

Disentangled Sequence to Sequence Learning for Compositional Generalization

arXiv:2110.04655v2644 citations
Originality Incremental advance
AI Analysis

This addresses a key limitation in neural models for tasks like semantic parsing and machine translation, offering an incremental improvement by enhancing disentanglement to boost systematic generalization.

The paper tackles the problem of neural sequence-to-sequence models struggling with compositional generalization to unseen combinations of known components, and demonstrates that their proposed method, which adaptively re-encodes source input based on target context, leads to more disentangled representations and improved generalization in semantic parsing and machine translation tasks.

There is mounting evidence that existing neural network models, in particular the very popular sequence-to-sequence architecture, struggle to systematically generalize to unseen compositions of seen components. We demonstrate that one of the reasons hindering compositional generalization relates to representations being entangled. We propose an extension to sequence-to-sequence models which encourages disentanglement by adaptively re-encoding (at each time step) the source input. Specifically, we condition the source representations on the newly decoded target context which makes it easier for the encoder to exploit specialized information for each prediction rather than capturing it all in a single forward pass. Experimental results on semantic parsing and machine translation empirically show that our proposal delivers more disentangled representations and better generalization.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes