CLAILGFeb 9, 2022

pNLP-Mixer: an Efficient all-MLP Architecture for Language

arXiv:2202.04350v2224 citations
Originality Incremental advance
AI Analysis

This enables efficient on-device NLP applications for resource-limited devices like smart watches, representing an incremental improvement in model compression and efficiency.

The paper tackles the problem of deploying large language models on constrained devices by introducing pNLP-Mixer, an embedding-free MLP-Mixer architecture that achieves high weight-efficiency, with a 1 MB model performing at 99.4% and 97.8% of mBERT on MTOP and multiATIS while using 170x fewer parameters and beating a state-of-the-art tiny model by up to 7.8%.

Large pre-trained language models based on transformer architecture have drastically changed the natural language processing (NLP) landscape. However, deploying those models for on-device applications in constrained devices such as smart watches is completely impractical due to their size and inference cost. As an alternative to transformer-based architectures, recent work on efficient NLP has shown that weight-efficient models can attain competitive performance for simple tasks, such as slot filling and intent classification, with model sizes in the order of the megabyte. This work introduces the pNLP-Mixer architecture, an embedding-free MLP-Mixer model for on-device NLP that achieves high weight-efficiency thanks to a novel projection layer. We evaluate a pNLP-Mixer model of only one megabyte in size on two multi-lingual semantic parsing datasets, MTOP and multiATIS. Our quantized model achieves 99.4% and 97.8% the performance of mBERT on MTOP and multi-ATIS, while using 170x fewer parameters. Our model consistently beats the state-of-the-art of tiny models (pQRNN), which is twice as large, by a margin up to 7.8% on MTOP.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes