LGJun 27, 2023

Length Generalization in Arithmetic Transformers

Samy Jelassi, Stéphane d'Ascoli, Carles Domingo-Enrich, Yuhuai Wu, Yuanzhi Li, François Charton

arXiv:2306.15400v127.761 citationsh-index: 37

Originality Incremental advance

AI Analysis

This addresses the challenge of length generalization in transformers for arithmetic tasks, which is incremental as it builds on existing methods with a new priming technique.

The paper tackled the problem of enabling transformers to generalize to longer sequences than seen during training for integer arithmetic tasks, finding that relative position embeddings allow length generalization for addition but fail for multiplication, and proposing train set priming which enables models trained on 5-digit × 3-digit multiplications to generalize to 35×3 examples.

We examine how transformers cope with two challenges: learning basic integer arithmetic, and generalizing to longer sequences than seen during training. We find that relative position embeddings enable length generalization for simple tasks, such as addition: models trained on $5$-digit numbers can perform $15$-digit sums. However, this method fails for multiplication, and we propose train set priming: adding a few ($10$ to $50$) long sequences to the training set. We show that priming allows models trained on $5$-digit $\times$ $3$-digit multiplications to generalize to $35\times 3$ examples. We also show that models can be primed for different generalization lengths, and that the priming sample size scales as the logarithm of the training set size. Finally, we discuss potential applications of priming beyond arithmetic.

View on arXiv PDF

Similar