Making Transformers Solve Compositional Tasks
This addresses a key limitation in NLP for tasks requiring compositional reasoning, though it appears incremental as it explores design variations rather than introducing a new paradigm.
The paper tackled the problem of Transformer models failing to generalize compositionally in NLP tasks like semantic parsing, and found that specific Transformer configurations achieve state-of-the-art results on benchmarks such as COGS and PCFG.
Several studies have reported the inability of Transformer models to generalize compositionally, a key type of generalization in many NLP tasks such as semantic parsing. In this paper we explore the design space of Transformer models showing that the inductive biases given to the model by several design decisions significantly impact compositional generalization. Through this exploration, we identified Transformer configurations that generalize compositionally significantly better than previously reported in the literature in a diverse set of compositional tasks, and that achieve state-of-the-art results in a semantic parsing compositional generalization benchmark (COGS), and a string edit operation composition benchmark (PCFG).