Miaomiao Ma

h-index11

3papers

76citations

Novelty57%

AI Score32

Ranked #136,279 of 205,806 authors (top 66%)#9,246 in AI (top 65%)

3 Papers

NAMar 17, 2017

Accuracy Directly Controlled Fast Direct Solutions of General ${\cal H}^2$-Matrices and Its Application to Electrically Large Integral-Equation-Based Electromagnetic Analysis

Miaomiao Ma, Dan Jiao

The dense matrix resulting from an integral equation (IE) based solution of Maxwell's equations can be compactly represented by an ${\cal H}^2$-matrix. Given a general dense ${\cal H}^2$-matrix, prevailing fast direct solutions involve approximations whose accuracy can only be indirectly controlled. In this work, we propose new accuracy-controlled direct solution algorithms, including both factorization and inversion, for solving general ${\cal H}^2$-matrices, which does not exist prior to this work. Different from existing direct solutions, where the cluster bases are kept unchanged in the solution procedure thus lacking explicit accuracy control, the proposed new algorithms update the cluster bases and their rank level by level based on prescribed accuracy, without increasing computational complexity. Zeros are also introduced level by level such that the size of the matrix blocks computed at each tree level is the rank at that level, and hence being small. The proposed new direct solution has been applied to solve electrically large volume IEs whose rank linearly grows with electric size. A complexity of $O(NlogN)$ in factorization and inversion time, and a complexity of $O(N)$ in storage and solution time are both theoretically proven and numerically demonstrated. For constant-rank cases, the proposed direct solution has a strict $O(N)$ complexity in both time and memory. Rapid direct solutions of millions of unknowns can be obtained on a single CPU core with directly controlled accuracy.

AISep 18, 2023

A Multitask Training Approach to Enhance Whisper with Contextual Biasing and Open-Vocabulary Keyword Spotting

Yuang Li, Min Zhang, Chang Su et al.

The recognition of rare named entities, such as personal names and terminologies, is challenging for automatic speech recognition (ASR) systems, especially when they are not frequently observed in the training data. In this paper, we introduce keyword spotting enhanced Whisper (KWS-Whisper), a novel ASR system that leverages the Whisper model and performs open-vocabulary keyword spotting (OV-KWS) on the hidden states of the Whisper encoder to recognize user-defined named entities. These entities serve as prompts for the Whisper decoder. To optimize the model, we propose a multitask training approach that learns OV-KWS and contextual-ASR tasks. We evaluate our approach on Chinese Aishell hot word subsets and two internal code-switching test sets and show that it significantly improves the entity recall compared to the original Whisper model. Moreover, we demonstrate that the OV-KWS can be a plug-and-play module to enhance the ASR error correction methods and frozen Whisper models.

SDApr 7, 2024

Cross-Domain Audio Deepfake Detection: Dataset and Analysis

Yuang Li, Min Zhang, Mengxin Ren et al.

Audio deepfake detection (ADD) is essential for preventing the misuse of synthetic voices that may infringe on personal rights and privacy. Recent zero-shot text-to-speech (TTS) models pose higher risks as they can clone voices with a single utterance. However, the existing ADD datasets are outdated, leading to suboptimal generalization of detection models. In this paper, we construct a new cross-domain ADD dataset comprising over 300 hours of speech data that is generated by five advanced zero-shot TTS models. To simulate real-world scenarios, we employ diverse attack methods and audio prompts from different datasets. Experiments show that, through novel attack-augmented training, the Wav2Vec2-large and Whisper-medium models achieve equal error rates of 4.1\% and 6.5\% respectively. Additionally, we demonstrate our models' outstanding few-shot ADD ability by fine-tuning with just one minute of target-domain data. Nonetheless, neural codec compressors greatly affect the detection accuracy, necessitating further research.