LGMay 25

Length Generalization with Log-Depth Recurrent Units

arXiv:2605.2603553.7
AI Analysis

Addresses the persistent challenge of length generalization in neural networks, which is crucial for reliable sequence modeling across domains.

MLP-LDRU, a Log-Depth Recurrent Unit, achieves 100% out-of-distribution accuracy on 18 of 21 regular-language tasks and at least 99.9% on the remaining 3, outperforming recurrent and attention-based models in length generalization.

Length generalization remains a persistent challenge for neural networks: recurrent models tend to suffer from positional biases, while transformers are constrained by fixed computational depth. Regular languages provide a frequently used testbed for evaluating length generalization, as label prediction can be checked for any sequence length. We propose MLP-LDRU, a type of Log-Depth Recurrent Unit, which captures a class of associativity-biased operators designed to approximate recurrence through parallel reduction. We evaluate MLP-LDRU on 21 regular-language tasks, consisting of standard benchmarks and new prefix languages, where it achieves 100% out-of-distribution accuracy on 18 tasks and at least 99.9% on the remaining 3 when increasing max training length, outperforming comparable recurrent and attention-based models. We further evaluate MLP-LDRU beyond regular languages on ListOps and NLP classification benchmarks, where it performs competitively.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes