CL IRFeb 10, 2023

Distillation of encoder-decoder transformers for sequence labelling

Marco Farina, Duccio Pappadopulo, Anant Gupta, Leslie Huang, Ozan İrsoy, Thamar Solorio

arXiv:2302.05454v128.1268 citationsh-index: 39

Originality Incremental advance

AI Analysis

This provides a practical solution for deploying efficient models in NLP, though it is incremental as it builds on existing distillation work.

The paper tackles the problem of distilling large language models into compute-efficient versions for sequence labeling, achieving new state-of-the-art performance across multiple datasets.

Driven by encouraging results on a wide range of tasks, the field of NLP is experiencing an accelerated race to develop bigger language models. This race for bigger models has also underscored the need to continue the pursuit of practical distillation approaches that can leverage the knowledge acquired by these big models in a compute-efficient manner. Having this goal in mind, we build on recent work to propose a hallucination-free framework for sequence tagging that is especially suited for distillation. We show empirical results of new state-of-the-art performance across multiple sequence labelling datasets and validate the usefulness of this framework for distilling a large model in a few-shot learning scenario.

View on arXiv PDF

Similar