Investigating Recurrent Transformers with Dynamic Halt
This work addresses the problem of enhancing Transformer models with recurrence for tasks requiring long-range dependencies, but it appears incremental as it builds on existing methods without claiming major breakthroughs.
The paper investigates recurrent mechanisms in Transformers, proposing novel extensions like a dynamic halting mechanism for Universal Transformers and combining elements from Temporal Latent Bottleneck, and evaluates them on diagnostic tasks such as Long Range Arena and flip-flop language modeling.
In this paper, we comprehensively study the inductive biases of two major approaches to augmenting Transformers with a recurrent mechanism: (1) the approach of incorporating a depth-wise recurrence similar to Universal Transformers; and (2) the approach of incorporating a chunk-wise temporal recurrence like Temporal Latent Bottleneck. Furthermore, we propose and investigate novel ways to extend and combine the above methods - for example, we propose a global mean-based dynamic halting mechanism for Universal Transformers and an augmentation of Temporal Latent Bottleneck with elements from Universal Transformer. We compare the models and probe their inductive biases in several diagnostic tasks, such as Long Range Arena (LRA), flip-flop language modeling, ListOps, and Logical Inference. The code is released in: https://github.com/JRC1995/InvestigatingRecurrentTransformers/tree/main