CLApr 15, 2021

Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

arXiv:2104.07777v1728 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the challenge of fast deployment of TTS systems in multiple languages for developers, though it is incremental as it builds on existing TN methods with a focus on data efficiency.

The paper tackles the problem of developing Text Normalization (TN) systems for Text-to-Speech (TTS) in new languages with limited data, proposing a novel architecture that uses less than 3% of the data compared to state-of-the-art methods on English and achieves comparable performance, including first results for Spanish and Tamil.

Developing Text Normalization (TN) systems for Text-to-Speech (TTS) on new languages is hard. We propose a novel architecture to facilitate it for multiple languages while using data less than 3% of the size of the data used by the state of the art results on English. We treat TN as a sequence classification problem and propose a granular tokenization mechanism that enables the system to learn majority of the classes and their normalizations from the training data itself. This is further combined with minimal precoded linguistic knowledge for other classes. We publish the first results on TN for TTS in Spanish and Tamil and also demonstrate that the performance of the approach is comparable with the previous work done on English. All annotated datasets used for experimentation will be released at https://github.com/amazon-research/proteno.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes