CLDec 12, 2023

The GUA-Speech System Description for CNVSRC Challenge 2023

Shengqiang Li, Chao Lei, Baozhong Ma, Binbin Zhang, Fuping Pan

arXiv:2312.07254v10.51 citationsh-index: 3

Originality Synthesis-oriented

AI Analysis

This work addresses visual speech recognition for Chinese speakers, representing an incremental improvement in a domain-specific challenge.

The authors tackled visual speech recognition for Chinese by developing a system that combines Inter CTC residual modules, a bi-transformer decoder, and Chinese character modeling units, achieving a character error rate of 38.09% and a 21.63% relative reduction over the baseline, securing second place in the CNVSRC 2023 challenge.

This study describes our system for Task 1 Single-speaker Visual Speech Recognition (VSR) fixed track in the Chinese Continuous Visual Speech Recognition Challenge (CNVSRC) 2023. Specifically, we use intermediate connectionist temporal classification (Inter CTC) residual modules to relax the conditional independence assumption of CTC in our model. Then we use a bi-transformer decoder to enable the model to capture both past and future contextual information. In addition, we use Chinese characters as the modeling units to improve the recognition accuracy of our model. Finally, we use a recurrent neural network language model (RNNLM) for shallow fusion in the inference stage. Experiments show that our system achieves a character error rate (CER) of 38.09% on the Eval set which reaches a relative CER reduction of 21.63% over the official baseline, and obtains a second place in the challenge.

View on arXiv PDF

Similar