CLSDASJul 6, 2022

Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car Commands

arXiv:2207.02663v11 citationsh-index: 25
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of limited datasets for low-resource languages like Cantonese in in-car smart assistants, which is incremental as it provides a new benchmark but does not introduce novel methods.

The authors tackled the data scarcity issue for low-resource languages in in-car speech recognition by collecting a new Cantonese audio-visual dataset (CI-AVSR) and proposing a Kaggle competition to address this challenge.

With the rise of deep learning and intelligent vehicles, the smart assistant has become an essential in-car component to facilitate driving and provide extra functionalities. In-car smart assistants should be able to process general as well as car-related commands and perform corresponding actions, which eases driving and improves safety. However, in this research field, most datasets are in major languages, such as English and Chinese. There is a huge data scarcity issue for low-resource languages, hindering the development of research and applications for broader communities. Therefore, it is crucial to have more benchmarks to raise awareness and motivate the research in low-resource languages. To mitigate this problem, we collect a new dataset, namely Cantonese In-car Audio-Visual Speech Recognition (CI-AVSR), for in-car speech recognition in the Cantonese language with video and audio data. Together with it, we propose Cantonese Audio-Visual Speech Recognition for In-car Commands as a new challenge for the community to tackle low-resource speech recognition under in-car scenarios.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes