CLJul 13, 2020

Do You Have the Right Scissors? Tailoring Pre-trained Language Models via Monte-Carlo Methods

arXiv:2007.06162v11002 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses a practical limitation in adapting large language models to specific text generation tasks with limited data, though it appears incremental as it builds on standard fine-tuning approaches.

The paper tackles the problem of over- and under-estimation when fine-tuning pre-trained language models on small datasets for text generation tasks, proposing MC-Tailor which truncates and transfers probability mass to address this issue, resulting in consistent and significant performance improvements over fine-tuning.

It has been a common approach to pre-train a language model on a large corpus and fine-tune it on task-specific data. In practice, we observe that fine-tuning a pre-trained model on a small dataset may lead to over- and/or under-estimation problem. In this paper, we propose MC-Tailor, a novel method to alleviate the above issue in text generation tasks by truncating and transferring the probability mass from over-estimated regions to under-estimated ones. Experiments on a variety of text generation datasets show that MC-Tailor consistently and significantly outperforms the fine-tuning approach. Our code is available at this url.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes