CLAIAug 20, 2025

Transplant Then Regenerate: A New Paradigm for Text Data Augmentation

arXiv:2508.14723v32 citationsh-index: 5EMNLP
Originality Incremental advance
AI Analysis

This addresses the problem of limited diversity and control in text augmentation for deep learning practitioners, offering a novel approach that is not incremental but introduces a new paradigm.

The paper tackles the challenge of controlling style and structure in text data augmentation with large language models by proposing LMTransplant, a transplant-then-regenerate paradigm that incorporates seed text into an expanded context to generate diverse content-level variants, resulting in superior performance over existing methods and exceptional scalability as data grows.

Data augmentation is a critical technique in deep learning. Traditional methods like Back-translation typically focus on lexical-level rephrasing, which primarily produces variations with the same semantics. While large language models (LLMs) have enhanced text augmentation by their "knowledge emergence" capability, controlling the style and structure of these outputs remains challenging and requires meticulous prompt engineering. In this paper, we propose LMTransplant, a novel text augmentation paradigm leveraging LLMs. The core idea of LMTransplant is transplant-then-regenerate: incorporating seed text into a context expanded by LLM, and asking the LLM to regenerate a variant based on the expanded context. This strategy allows the model to create more diverse and creative content-level variants by fully leveraging the knowledge embedded in LLMs, while preserving the core attributes of the original text. We evaluate LMTransplant across various text-related tasks, demonstrating its superior performance over existing text augmentation methods. Moreover, LMTransplant demonstrates exceptional scalability as the size of augmented data grows.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes