Mitigating Data Scarceness through Data Synthesis, Augmentation and Curriculum for Abstractive Summarization
This work addresses data scarcity for abstractive summarization tasks, but it is incremental as it builds on existing techniques without introducing a fundamentally new approach.
The paper tackled the problem of data scarcity in abstractive summarization by exploring three data manipulation techniques—synthesis, augmentation, and curriculum learning—without requiring additional data, resulting in improved performance across two models and two small datasets, with gains observed both in isolation and combination.
This paper explores three simple data manipulation techniques (synthesis, augmentation, curriculum) for improving abstractive summarization models without the need for any additional data. We introduce a method of data synthesis with paraphrasing, a data augmentation technique with sample mixing, and curriculum learning with two new difficulty metrics based on specificity and abstractiveness. We conduct experiments to show that these three techniques can help improve abstractive summarization across two summarization models and two different small datasets. Furthermore, we show that these techniques can improve performance when applied in isolation and when combined.