CLFeb 15, 2021

MAPGN: MAsked Pointer-Generator Network for sequence-to-sequence pre-training

arXiv:2102.07380v2
Originality Incremental advance
AI Analysis

This addresses the data scarcity issue in spoken-text normalization for applications like machine translation, though it is incremental as it adapts self-supervised learning to a specific network type.

The paper tackles the problem of spoken-text normalization, which converts spoken-style text to normalized text for downstream tasks, by proposing MAPGN, a self-supervised learning method for pointer-generator networks that improves performance with limited paired data, achieving better results than conventional methods in two tasks.

This paper presents a self-supervised learning method for pointer-generator networks to improve spoken-text normalization. Spoken-text normalization that converts spoken-style text into style normalized text is becoming an important technology for improving subsequent processing such as machine translation and summarization. The most successful spoken-text normalization method to date is sequence-to-sequence (seq2seq) mapping using pointer-generator networks that possess a copy mechanism from an input sequence. However, these models require a large amount of paired data of spoken-style text and style normalized text, and it is difficult to prepare such a volume of data. In order to construct spoken-text normalization model from the limited paired data, we focus on self-supervised learning which can utilize unpaired text data to improve seq2seq models. Unfortunately, conventional self-supervised learning methods do not assume that pointer-generator networks are utilized. Therefore, we propose a novel self-supervised learning method, MAsked Pointer-Generator Network (MAPGN). The proposed method can effectively pre-train the pointer-generator network by learning to fill masked tokens using the copy mechanism. Our experiments demonstrate that MAPGN is more effective for pointer-generator networks than the conventional self-supervised learning methods in two spoken-text normalization tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes