Word-Free Spoken Language Understanding for Mandarin-Chinese
This addresses the need for more efficient SLU systems in spoken dialogue applications like Siri and Alexa, particularly for languages like Mandarin Chinese, but appears incremental as it builds on existing Transformer and phone-based methods.
The authors tackled the problem of spoken language understanding (SLU) systems depending on automatic speech recognition (ASR) modules, which require large language-specific training data, by proposing a Transformer-based SLU system that works directly on phones without ASR, and verified its effectiveness on a Mandarin Chinese intent classification dataset.
Spoken dialogue systems such as Siri and Alexa provide great convenience to people's everyday life. However, current spoken language understanding (SLU) pipelines largely depend on automatic speech recognition (ASR) modules, which require a large amount of language-specific training data. In this paper, we propose a Transformer-based SLU system that works directly on phones. This acoustic-based SLU system consists of only two blocks and does not require the presence of ASR module. The first block is a universal phone recognition system, and the second block is a Transformer-based language model for phones. We verify the effectiveness of the system on an intent classification dataset in Mandarin Chinese.