SD ASJul 29, 2021

Multi-Task Learning in Utterance-Level and Segmental-Level Spoof Detection

Lin Zhang, Xin Wang, Erica Cooper, Junichi Yamagishi

arXiv:2107.14132v217.338 citations

Originality Incremental advance

AI Analysis

This work addresses spoof detection in speech processing, which is important for security applications, but it is incremental as it builds on existing multi-task learning and network architectures.

The paper tackled the problem of simultaneously detecting spoofing at segmental and utterance levels in the PartialSpoof database by proposing a multi-task learning framework with SELCNN and Bi-LSTM, showing that a binary-branch architecture with fine-tuning performs relatively better than single-task models.

In this paper, we provide a series of multi-tasking benchmarks for simultaneously detecting spoofing at the segmental and utterance levels in the PartialSpoof database. First, we propose the SELCNN network, which inserts squeeze-and-excitation (SE) blocks into a light convolutional neural network (LCNN) to enhance the capacity of hidden feature selection. Then, we implement multi-task learning (MTL) frameworks with SELCNN followed by bidirectional long short-term memory (Bi-LSTM) as the basic model. We discuss MTL in PartialSpoof in terms of architecture (uni-branch/multi-branch) and training strategies (from-scratch/warm-up) step-by-step. Experiments show that the multi-task model performs relatively better than single-task models. Also, in MTL, a binary-branch architecture more adequately utilizes information from two levels than a uni-branch model. For the binary-branch architecture, fine-tuning a warm-up model works better than training from scratch. Models can handle both segment-level and utterance-level predictions simultaneously overall under a binary-branch multi-task architecture. Furthermore, the multi-task model trained by fine-tuning a segmental warm-up model performs relatively better at both levels except on the evaluation set for segmental detection. Segmental detection should be explored further.

View on arXiv PDF

Similar