LG NEJul 17, 2024

When can transformers compositionally generalize in-context?

Seijin Kobayashi, Simon Schug, Yassir Akram, Florian Redhardt, Johannes von Oswald, Razvan Pascanu, Guillaume Lajoie, João Sacramento

DeepMind

arXiv:2407.12275v111.56 citationsh-index: 75

Originality Incremental advance

AI Analysis

This addresses the problem of compositional generalization in transformers for researchers in machine learning and AI, identifying a key limitation and a solution, though it is incremental as it builds on existing modular multitask settings.

The paper investigates when transformers can generalize compositionally from a subset of tasks to all combinations of tasks with shared components, finding that they struggle to do so in-context despite theoretical expressiveness, and that compositional generalization becomes possible only with a bottleneck enforcing separation between task inference and execution.

Many tasks can be composed from a few independent components. This gives rise to a combinatorial explosion of possible tasks, only some of which might be encountered during training. Under what circumstances can transformers compositionally generalize from a subset of tasks to all possible combinations of tasks that share similar components? Here we study a modular multitask setting that allows us to precisely control compositional structure in the data generation process. We present evidence that transformers learning in-context struggle to generalize compositionally on this task despite being in principle expressive enough to do so. Compositional generalization becomes possible only when introducing a bottleneck that enforces an explicit separation between task inference and task execution.

View on arXiv PDF

Similar