CLMay 25, 2022

Overcoming Catastrophic Forgetting in Zero-Shot Cross-Lingual Generation

BerkeleyDeepMind
arXiv:2205.12647v2317 citationsh-index: 48
Originality Incremental advance
AI Analysis

This addresses the challenge of enabling generative AI models to work across languages without parallel data, which is crucial for global applications, though it is incremental as it builds on existing adaptation techniques.

The paper tackles the problem of catastrophic forgetting in zero-shot cross-lingual generation for tasks like summarization, where models fine-tuned on English data fail to generate in other languages, and finds that parameter-efficient prompt tuning improves transfer, especially between less-related languages like English and Thai, but still lags behind fully-supervised baselines.

In this paper, we explore the challenging problem of performing a generative task in a target language when labeled data is only available in English, using summarization as a case study. We assume a strict setting with no access to parallel data or machine translation and find that common transfer learning approaches struggle in this setting, as a generative multilingual model fine-tuned purely on English catastrophically forgets how to generate non-English. Given the recent rise of parameter-efficient adaptation techniques, we conduct the first investigation into how one such method, prompt tuning (Lester et al., 2021), can overcome catastrophic forgetting to enable zero-shot cross-lingual generation. Our experiments show that parameter-efficient prompt tuning provides gains over standard fine-tuning when transferring between less-related languages, e.g., from English to Thai. However, a significant gap still remains between these methods and fully-supervised baselines. To improve cross-lingual transfer further, we explore several approaches, including: (1) mixing in unlabeled multilingual data, and (2) explicitly factoring prompts into recombinable language and task components. Our approaches can provide further quality gains, suggesting that robust zero-shot cross-lingual generation is within reach.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes