CLDec 12, 2021

Improving Code-switching Language Modeling with Artificially Generated Texts using Cycle-consistent Adversarial Networks

arXiv:2112.06327v114 citations
Originality Synthesis-oriented
AI Analysis

This addresses data scarcity for Code-switching language modeling, which is an incremental improvement in a domain-specific area.

The paper tackles the problem of data scarcity in Code-switching language modeling by artificially generating Code-switching text using a cycle-consistent adversarial networks framework, resulting in consistent improvements in language model and automatic speech recognition performance on the SEAME corpus.

This paper presents our latest effort on improving Code-switching language models that suffer from data scarcity. We investigate methods to augment Code-switching training text data by artificially generating them. Concretely, we propose a cycle-consistent adversarial networks based framework to transfer monolingual text into Code-switching text, considering Code-switching as a speaking style. Our experimental results on the SEAME corpus show that utilising artificially generated Code-switching text data improves consistently the language model as well as the automatic speech recognition performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes