CLAIDec 12, 2022

Real-World Compositional Generalization with Disentangled Sequence-to-Sequence Learning

arXiv:2212.05982v1225 citationsh-index: 86
Originality Incremental advance
AI Analysis

This work addresses compositional generalization for natural language processing, with incremental improvements to an existing model.

The paper tackled the problem of compositional generalization in neural networks by modifying a disentangled sequence-to-sequence model to improve efficiency and disentanglement, resulting in better performance across tasks and a new benchmark.

Compositional generalization is a basic mechanism in human language learning, which current neural networks struggle with. A recently proposed Disentangled sequence-to-sequence model (Dangle) shows promising generalization capability by learning specialized encodings for each decoding step. We introduce two key modifications to this model which encourage more disentangled representations and improve its compute and memory efficiency, allowing us to tackle compositional generalization in a more realistic setting. Specifically, instead of adaptively re-encoding source keys and values at each time step, we disentangle their representations and only re-encode keys periodically, at some interval. Our new architecture leads to better generalization performance across existing tasks and datasets, and a new machine translation benchmark which we create by detecting naturally occurring compositional patterns in relation to a training set. We show this methodology better emulates real-world requirements than artificial challenges.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes