CLOct 15, 2021

Training Dynamics for Text Summarization Models

arXiv:2110.08370v2641 citations
Originality Incremental advance
AI Analysis

This work addresses the lack of understanding in fine-tuning dynamics for summarization models, offering insights for researchers and practitioners to enhance model performance, though it is incremental in nature.

The study analyzed the fine-tuning process of pre-trained language models for text summarization, finding that copying input is learned early while factual errors like hallucination occur later, and proposed training modifications to improve factuality or abstractiveness.

Pre-trained language models (e.g. BART) have shown impressive results when fine-tuned on large summarization datasets. However, little is understood about this fine-tuning process, including what knowledge is retained from pre-training time or how content selection and generation strategies are learnt across iterations. In this work, we analyze the training dynamics for generation models, focusing on summarization. Across different datasets (CNN/DM, XSum, MediaSum) and summary properties, such as abstractiveness and hallucination, we study what the model learns at different stages of its fine-tuning process. We find that a propensity to copy the input is learned early in the training process consistently across all datasets studied. On the other hand, factual errors, such as hallucination of unsupported facts, are learnt in the later stages, though this behavior is more varied across domains. Based on these observations, we explore complementary approaches for modifying training: first, disregarding high-loss tokens that are challenging to learn and second, disregarding low-loss tokens that are learnt very quickly in the latter stages of the training process. We show that these simple training modifications allow us to configure our model to achieve different goals, such as improving factuality or improving abstractiveness.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes