CL LGDec 20, 2022

In-context Learning Distillation: Transferring Few-shot Learning Ability of Pre-trained Language Models

Yukun Huang, Yanda Chen, Zhou Yu, Kathleen McKeown

arXiv:2212.10670v18.143 citationsh-index: 76

Originality Incremental advance

AI Analysis

This work addresses the challenge of enabling smaller models to perform few-shot learning efficiently, which is incremental as it builds on existing in-context learning techniques.

The paper tackles the problem of transferring in-context few-shot learning ability from large pre-trained language models to smaller models by introducing in-context learning distillation, which combines in-context learning and language modeling objectives. The method shows consistent improvements on benchmarks like LAMA and CrossFit, with Multitask-ICT performing better on multitask few-shot learning but requiring more computation than Meta-ICT.

Given the success with in-context learning of large pre-trained language models, we introduce in-context learning distillation to transfer in-context few-shot learning ability from large models to smaller models. We propose to combine in-context learning objectives with language modeling objectives to distill both the ability to read in-context examples and task knowledge to the smaller models. We perform in-context learning distillation under two different few-shot learning paradigms: Meta In-context Tuning (Meta-ICT) and Multitask In-context Tuning (Multitask-ICT). Multitask-ICT performs better on multitask few-shot learning but also requires more computation than Meta-ICT. Our method shows consistent improvements for both Meta-ICT and Multitask-ICT on two benchmarks: LAMA and CrossFit. Our extensive experiments and analysis reveal that in-context learning objectives and language modeling objectives are complementary under the Multitask-ICT paradigm. In-context learning objectives achieve the best performance when combined with language modeling objectives.

View on arXiv PDF

Similar