Qian Ge

4papers

29citations

Novelty63%

AI Score27

Ranked #161,539 of 205,806 authors (top 78%)#27,825 in CL (top 86%)

4 Papers

CLFeb 5, 2023

MAC: A unified framework boosting low resource automatic speech recognition

Zeping Min, Qian Ge, Zhong Li et al.

We propose a unified framework for low resource automatic speech recognition tasks named meta audio concatenation (MAC). It is easy to implement and can be carried out in extremely low resource environments. Mathematically, we give a clear description of MAC framework from the perspective of bayesian sampling. In this framework, we leverage a novel concatenative synthesis text-to-speech system to boost the low resource ASR task. By the concatenative synthesis text-to-speech system, we can integrate language pronunciation rules and adjust the TTS process. Furthermore, we propose a broad notion of meta audio set to meet the modeling needs of different languages and different scenes when using the system. Extensive experiments have demonstrated the great effectiveness of MAC on low resource ASR tasks. For CTC greedy search, CTC prefix, attention, and attention rescoring decode mode in Cantonese ASR task, Taiwanese ASR task, and Japanese ASR task the MAC method can reduce the CER by more than 15\%. Furthermore, in the ASR task, MAC beats wav2vec2 (with fine-tuning) on common voice datasets of Cantonese and gets really competitive results on common voice datasets of Taiwanese and Japanese. Among them, it is worth mentioning that we achieve a \textbf{10.9\%} character error rate (CER) on the common voice Cantonese ASR task, bringing about \textbf{30\%} relative improvement compared to the wav2vec2 (with fine-tuning).

LGNov 18, 2022

Why the pseudo label based semi-supervised learning algorithm is effective?

Zeping Min, Qian Ge, Cheng Tai

Recently, pseudo label based semi-supervised learning has achieved great success in many fields. The core idea of the pseudo label based semi-supervised learning algorithm is to use the model trained on the labeled data to generate pseudo labels on the unlabeled data, and then train a model to fit the previously generated pseudo labels. We give a theory analysis for why pseudo label based semi-supervised learning is effective in this paper. We mainly compare the generalization error of the model trained under two settings: (1) There are N labeled data. (2) There are N unlabeled data and a suitable initial model. Our analysis shows that, firstly, when the amount of unlabeled data tends to infinity, the pseudo label based semi-supervised learning algorithm can obtain model which have the same generalization error upper bound as model obtained by normally training in the condition of the amount of labeled data tends to infinity. More importantly, we prove that when the amount of unlabeled data is large enough, the generalization error upper bound of the model obtained by pseudo label based semi-supervised learning algorithm can converge to the optimal upper bound with linear convergence rate. We also give the lower bound on sampling complexity to achieve linear convergence rate. Our analysis contributes to understanding the empirical successes of pseudo label-based semi-supervised learning.

SDOct 27, 2022

SAN: a robust end-to-end ASR model architecture

Zeping Min, Qian Ge, Guanhua Huang

In this paper, we propose a novel Siamese Adversarial Network (SAN) architecture for automatic speech recognition, which aims at solving the difficulty of fuzzy audio recognition. Specifically, SAN constructs two sub-networks to differentiate the audio feature input and then introduces a loss to unify the output distribution of these sub-networks. Adversarial learning enables the network to capture more essential acoustic features and helps the models achieve better performance when encountering fuzzy audio input. We conduct numerical experiments with the SAN model on several datasets for the automatic speech recognition task. All experimental results show that the siamese adversarial nets significantly reduce the character error rate (CER). Specifically, we achieve a new state of art 4.37 CER without language model on the AISHELL-1 dataset, which leads to around 5% relative CER reduction. To reveal the generality of the siamese adversarial net, we also conduct experiments on the phoneme recognition task, which also shows the superiority of the siamese adversarial network.

CRDec 14, 2016

Your Processor Leaks Information - and There's Nothing You Can Do About It

Qian Ge, Yuval Yarom, Frank Li et al.

Timing channels are information flows, encoded in the relative timing of events, that bypass the system's protection mechanisms. Any microarchitectural state that depends on execution history and affects the rate of progress of later executions potentially establishes a timing channel, unless explicit steps are taken to close it. Such state includes CPU caches, TLBs, branch predictors and prefetchers; removing the channels requires that the OS can partition such state or flush it on a switch of security domains. We measure the capacities of channels based on these microarchitectural features on several generations of processors across the two mainstream ISAs, x86 and ARM, and investigate the effectiveness of the flushing mechanisms provided by the respective ISA.We find that in all processors we studied, at least one significant channel remains. This implies that closing all timing channels seems impossible on contemporary mainstream processors.