Guodong Ma

CL
h-index4
3papers
7citations
Novelty42%
AI Score27

3 Papers

CLJan 22, 2025
BLR-MoE: Boosted Language-Routing Mixture of Experts for Domain-Robust Multilingual E2E ASR

Guodong Ma, Wenxuan Wang, Lifeng Zhou et al.

Recently, the Mixture of Expert (MoE) architecture, such as LR-MoE, is often used to alleviate the impact of language confusion on the multilingual ASR (MASR) task. However, it still faces language confusion issues, especially in mismatched domain scenarios. In this paper, we decouple language confusion in LR-MoE into confusion in self-attention and router. To alleviate the language confusion in self-attention, based on LR-MoE, we propose to apply attention-MoE architecture for MASR. In our new architecture, MoE is utilized not only on feed-forward network (FFN) but also on self-attention. In addition, to improve the robustness of the LID-based router on language confusion, we propose expert pruning and router augmentation methods. Combining the above, we get the boosted language-routing MoE (BLR-MoE) architecture. We verify the effectiveness of the proposed BLR-MoE in a 10,000-hour MASR dataset.

SDDec 13, 2021
PM-MMUT: Boosted Phone-Mask Data Augmentation using Multi-Modeling Unit Training for Phonetic-Reduction-Robust E2E Speech Recognition

Guodong Ma, Pengfei Hu, Nurmemet Yolwas et al.

Consonant and vowel reduction are often encountered in speech, which might cause performance degradation in automatic speech recognition (ASR). Our recently proposed learning strategy based on masking, Phone Masking Training (PMT), alleviates the impact of such phenomenon in Uyghur ASR. Although PMT achieves remarkably improvements, there still exists room for further gains due to the granularity mismatch between the masking unit of PMT (phoneme) and the modeling unit (word-piece). To boost the performance of PMT, we propose multi-modeling unit training (MMUT) architecture fusion with PMT (PM-MMUT). The idea of MMUT framework is to split the Encoder into two parts including acoustic feature sequences to phoneme-level representation (AF-to-PLR) and phoneme-level representation to word-piece-level representation (PLR-to-WPLR). It allows AF-to-PLR to be optimized by an intermediate phoneme-based CTC loss to learn the rich phoneme-level context information brought by PMT. Experimental results on Uyghur ASR show that the proposed approaches outperform obviously the pure PMT. We also conduct experiments on the 960-hour Librispeech benchmark using ESPnet1, which achieves about 10% relative WER reduction on all the test set without LM fusion comparing with the latest official ESPnet1 pre-trained model.

INS-DETAug 6, 2018
Machine Learning Promoting Extreme Simplification of Spectroscopy Equipment

Jianchao Lee, Qiannan Duan, Sifan Bi et al.

The spectroscopy measurement is one of main pathways for exploring and understanding the nature. Today, it seems that racing artificial intelligence will remould its styles. The algorithms contained in huge neural networks are capable of substituting many of expensive and complex components of spectrum instruments. In this work, we presented a smart machine learning strategy on the measurement of absorbance curves, and also initially verified that an exceedingly-simplified equipment is sufficient to meet the needs for this strategy. Further, with its simplicity, the setup is expected to infiltrate into many scientific areas in versatile forms.