On the Design Space Between Transformers and Recursive Neural Nets
This work addresses the design space between RvNNs and Transformers for researchers in neural architecture, but it is incremental as it builds on existing models to formalize connections.
The paper tackles the problem of connecting Recursive Neural Networks (RvNNs) and Transformers by analyzing two recent models, CRvNN and NDR, which bridge these classes and show strong performance in algorithmic tasks and generalization where simpler versions fail.
In this paper, we study two classes of models, Recursive Neural Networks (RvNNs) and Transformers, and show that a tight connection between them emerges from the recent development of two recent models - Continuous Recursive Neural Networks (CRvNN) and Neural Data Routers (NDR). On one hand, CRvNN pushes the boundaries of traditional RvNN, relaxing its discrete structure-wise composition and ends up with a Transformer-like structure. On the other hand, NDR constrains the original Transformer to induce better structural inductive bias, ending up with a model that is close to CRvNN. Both models, CRvNN and NDR, show strong performance in algorithmic tasks and generalization in which simpler forms of RvNNs and Transformers fail. We explore these "bridge" models in the design space between RvNNs and Transformers, formalize their tight connections, discuss their limitations, and propose ideas for future research.