Go-tuning: Improving Zero-shot Learning Abilities of Smaller Language Models
This work addresses the challenge of making zero-shot learning more accessible by reducing reliance on large-scale models and delicate prompts, offering a method for smaller models to achieve competitive performance, though it is incremental as it builds on existing self-supervised techniques.
The paper tackles the problem of achieving strong zero-shot learning abilities in smaller language models without external supervised data, presenting Go-tuning, a geometry-guided self-supervised method that enables T5-small (80M) to achieve competitive zero-shot results compared to T5-XL (3B) and mgo-T5 (250M) to match the average performance of OPT (175B) on 9 datasets.
With increasing scale, large language models demonstrate both quantitative improvement and new qualitative capabilities, especially as zero-shot learners, like GPT-3. However, these results rely heavily on delicate prompt design and large computation. In this work, we explore whether the strong zero-shot ability could be achieved at a smaller model scale without any external supervised data. To achieve this goal, we revisit masked language modeling and present a geometry-guided self-supervised learning method (Go-tuningfor short) by taking a small number of task-aware self-supervised data to update language models further. Experiments show that Go-tuning can enable T5-small (80M) competitive zero-shot results compared with large language models, such as T5-XL (3B). We also apply Go-tuning on multi-task settings and develop a multi-task model, mgo-T5 (250M). It can reach the average performance of OPT (175B) on 9 datasets.