CLAIAug 9, 2022

High Recall Data-to-text Generation with Progressive Edit

arXiv:2208.04558v11 citationsh-index: 29
Originality Incremental advance
AI Analysis

This work addresses data-to-text generation for natural language processing applications, offering an incremental improvement by exploiting a specific model behavior to enhance performance.

The paper tackles the problem of data-to-text generation by addressing 'Asymmetric Generation' in Transformer models, where repeated target sentences lead to outputs of varying length and quality. By introducing Progressive Edit (ProEdit), which combines asymmetric sentences with non-repeated targets, the method achieves a new state-of-the-art result on the ToTTo dataset, improving recall to better cover structured inputs.

Data-to-text (D2T) generation is the task of generating texts from structured inputs. We observed that when the same target sentence was repeated twice, Transformer (T5) based model generates an output made up of asymmetric sentences from structured inputs. In other words, these sentences were different in length and quality. We call this phenomenon "Asymmetric Generation" and we exploit this in D2T generation. Once asymmetric sentences are generated, we add the first part of the output with a no-repeated-target. As this goes through progressive edit (ProEdit), the recall increases. Hence, this method better covers structured inputs than before editing. ProEdit is a simple but effective way to improve performance in D2T generation and it achieves the new stateof-the-art result on the ToTTo dataset

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes