CLJun 21, 2024

Word Matters: What Influences Domain Adaptation in Summarization?

arXiv:2406.14828v129 citations
Originality Incremental advance
AI Analysis

It addresses the problem of improving domain adaptation for summarization tasks, offering a method to predict performance without training, but it is incremental as it builds on existing domain adaptation concepts.

This paper investigates fine-grained factors affecting domain adaptation performance in summarization tasks, finding that cross-domain overlap and performance gain have an approximate linear relationship when considering dataset learning difficulty, measured by word-based compression rate and abstraction level, enabling prediction of model performance on unknown domains without training.

Domain adaptation aims to enable Large Language Models (LLMs) to generalize domain datasets unseen effectively during the training phase. However, factors such as the size of the model parameters and the scale of training data are general influencers and do not reflect the nuances of domain adaptation performance. This paper investigates the fine-grained factors affecting domain adaptation performance, analyzing the specific impact of `words' in training data on summarization tasks. We propose quantifying dataset learning difficulty as the learning difficulty of generative summarization, which is determined by two indicators: word-based compression rate and abstraction level. Our experiments conclude that, when considering dataset learning difficulty, the cross-domain overlap and the performance gain in summarization tasks exhibit an approximate linear relationship, which is not directly related to the number of words. Based on this finding, predicting a model's performance on unknown domain datasets is possible without undergoing training.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes