CLAug 29, 2021

Span Fine-tuning for Pre-trained Language Models

arXiv:2108.12848v2661 citations
AI Analysis

This work addresses a domain-specific problem for NLP researchers and practitioners by providing a more flexible and efficient way to enhance pre-trained models, though it is incremental as it builds on existing span-level pre-training methods.

The paper tackles the inflexibility and inefficiency of incorporating span-level information in pre-trained language models by introducing a span fine-tuning method that adaptively determines span settings during fine-tuning, achieving significant performance improvements on the GLUE benchmark.

Pre-trained language models (PrLM) have to carefully manage input units when training on a very large text with a vocabulary consisting of millions of words. Previous works have shown that incorporating span-level information over consecutive words in pre-training could further improve the performance of PrLMs. However, given that span-level clues are introduced and fixed in pre-training, previous methods are time-consuming and lack of flexibility. To alleviate the inconvenience, this paper presents a novel span fine-tuning method for PrLMs, which facilitates the span setting to be adaptively determined by specific downstream tasks during the fine-tuning phase. In detail, any sentences processed by the PrLM will be segmented into multiple spans according to a pre-sampled dictionary. Then the segmentation information will be sent through a hierarchical CNN module together with the representation outputs of the PrLM and ultimately generate a span-enhanced representation. Experiments on GLUE benchmark show that the proposed span fine-tuning method significantly enhances the PrLM, and at the same time, offer more flexibility in an efficient way.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes