CLAug 23, 2021

A Unified Transformer-based Framework for Duplex Text Normalization

arXiv:2108.09889v112 citations
Originality Incremental advance
AI Analysis

This reduces technical complexity and maintenance costs for spoken dialog systems by unifying TN and ITN into a single model, though it is incremental as it builds on existing neural methods.

The paper tackles the problem of separate models for text normalization (TN) and inverse text normalization (ITN) in spoken dialog systems by proposing a unified Transformer-based framework that handles both tasks simultaneously, achieving state-of-the-art results on Google TN datasets for English and Russian and over 95% sentence-level accuracy on an internal English dataset.

Text normalization (TN) and inverse text normalization (ITN) are essential preprocessing and postprocessing steps for text-to-speech synthesis and automatic speech recognition, respectively. Many methods have been proposed for either TN or ITN, ranging from weighted finite-state transducers to neural networks. Despite their impressive performance, these methods aim to tackle only one of the two tasks but not both. As a result, in a complete spoken dialog system, two separate models for TN and ITN need to be built. This heterogeneity increases the technical complexity of the system, which in turn increases the cost of maintenance in a production setting. Motivated by this observation, we propose a unified framework for building a single neural duplex system that can simultaneously handle TN and ITN. Combined with a simple but effective data augmentation method, our systems achieve state-of-the-art results on the Google TN dataset for English and Russian. They can also reach over 95% sentence-level accuracy on an internal English TN dataset without any additional fine-tuning. In addition, we also create a cleaned dataset from the Spoken Wikipedia Corpora for German and report the performance of our systems on the dataset. Overall, experimental results demonstrate the proposed duplex text normalization framework is highly effective and applicable to a range of domains and languages

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes