CL AI LGMar 7, 2022

HyperMixer: An MLP-based Low Cost Alternative to Transformers

Florian Mai, Arnaud Pannatier, Fabio Fehr, Haolin Chen, Francois Marelli, Francois Fleuret, James Henderson

arXiv:2203.03691v321.8230 citationsh-index: 66Has Code

Originality Incremental advance

AI Analysis

This work addresses the problem of high resource requirements for NLP models, offering a more efficient alternative for practitioners, though it is incremental as it builds on existing MLP-based architectures.

The paper tackles the high computational and data costs of Transformers for natural language understanding by proposing HyperMixer, an MLP-based variant that uses hypernetworks for dynamic token mixing, achieving performance on par with Transformers at substantially lower costs in processing time, training data, and hyperparameter tuning.

Transformer-based architectures are the model of choice for natural language understanding, but they come at a significant cost, as they have quadratic complexity in the input length, require a lot of training data, and can be difficult to tune. In the pursuit of lower costs, we investigate simple MLP-based architectures. We find that existing architectures such as MLPMixer, which achieves token mixing through a static MLP applied to each feature independently, are too detached from the inductive biases required for natural language understanding. In this paper, we propose a simple variant, HyperMixer, which forms the token mixing MLP dynamically using hypernetworks. Empirically, we demonstrate that our model performs better than alternative MLP-based models, and on par with Transformers. In contrast to Transformers, HyperMixer achieves these results at substantially lower costs in terms of processing time, training data, and hyperparameter tuning.

View on arXiv PDF Code

Similar