CLDec 18, 2024

Curriculum Learning for Cross-Lingual Data-to-Text Generation With Noisy Data

arXiv:2412.13484v1h-index: 11
Originality Incremental advance
AI Analysis

This work addresses the challenge of generating text from data in multiple languages with noisy inputs, which is incremental as it adapts existing curriculum learning methods to a cross-lingual and noisy context.

The paper tackled the problem of cross-lingual data-to-text generation with noisy data by applying curriculum learning with specific criteria and schedules, resulting in a BLEU score increase of up to 4 points and improvements in faithfulness and coverage by 5-15% on average across 11 Indian languages and English in two datasets.

Curriculum learning has been used to improve the quality of text generation systems by ordering the training samples according to a particular schedule in various tasks. In the context of data-to-text generation (DTG), previous studies used various difficulty criteria to order the training samples for monolingual DTG. These criteria, however, do not generalize to the crosslingual variant of the problem and do not account for noisy data. We explore multiple criteria that can be used for improving the performance of cross-lingual DTG systems with noisy data using two curriculum schedules. Using the alignment score criterion for ordering samples and an annealing schedule to train the model, we show increase in BLEU score by up to 4 points, and improvements in faithfulness and coverage of generations by 5-15% on average across 11 Indian languages and English in 2 separate datasets. We make code and data publicly available

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes