CLAIOct 13, 2021

Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese

arXiv:2110.06696v286 citationsHas Code
Originality Incremental advance
AI Analysis

This provides more efficient and deployable models for Chinese NLP tasks, though it is incremental as it builds on existing pre-training methods.

The authors tackled the problem of expensive pre-trained models by developing Mengzi, a family of lightweight Chinese pre-trained models that achieve new state-of-the-art results on the CLUE benchmark.

Although pre-trained models (PLMs) have achieved remarkable improvements in a wide range of NLP tasks, they are expensive in terms of time and resources. This calls for the study of training more efficient models with less computation but still ensures impressive performance. Instead of pursuing a larger scale, we are committed to developing lightweight yet more powerful models trained with equal or less computation and friendly to rapid deployment. This technical report releases our pre-trained model called Mengzi, which stands for a family of discriminative, generative, domain-specific, and multimodal pre-trained model variants, capable of a wide range of language and vision tasks. Compared with public Chinese PLMs, Mengzi is simple but more powerful. Our lightweight model has achieved new state-of-the-art results on the widely-used CLUE benchmark with our optimized pre-training and fine-tuning techniques. Without modifying the model architecture, our model can be easily employed as an alternative to existing PLMs. Our sources are available at https://github.com/Langboat/Mengzi.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes