ASAISDJul 19, 2021

Channel-wise Gated Res2Net: Towards Robust Detection of Synthetic Speech Attacks

arXiv:2107.08803v168 citations
Originality Incremental advance
AI Analysis

This work addresses the critical issue of robust detection of synthetic speech attacks for speaker verification systems, representing an incremental improvement over existing Res2Net methods.

The paper tackles the problem of improving generalizability to unseen synthetic speech attacks in automatic speaker verification anti-spoofing by proposing a channel-wise gated Res2Net (CG-Res2Net) that dynamically selects relevant channels, resulting in significant performance gains over Res2Net and other state-of-the-art systems on the ASVspoof 2019 LA dataset.

Existing approaches for anti-spoofing in automatic speaker verification (ASV) still lack generalizability to unseen attacks. The Res2Net approach designs a residual-like connection between feature groups within one block, which increases the possible receptive fields and improves the system's detection generalizability. However, such a residual-like connection is performed by a direct addition between feature groups without channel-wise priority. We argue that the information across channels may not contribute to spoofing cues equally, and the less relevant channels are expected to be suppressed before adding onto the next feature group, so that the system can generalize better to unseen attacks. This argument motivates the current work that presents a novel, channel-wise gated Res2Net (CG-Res2Net), which modifies Res2Net to enable a channel-wise gating mechanism in the connection between feature groups. This gating mechanism dynamically selects channel-wise features based on the input, to suppress the less relevant channels and enhance the detection generalizability. Three gating mechanisms with different structures are proposed and integrated into Res2Net. Experimental results conducted on ASVspoof 2019 logical access (LA) demonstrate that the proposed CG-Res2Net significantly outperforms Res2Net on both the overall LA evaluation set and individual difficult unseen attacks, which also outperforms other state-of-the-art single systems, depicting the effectiveness of our method.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes