CLAug 2, 2021

Dynamic Multi-scale Convolution for Dialect Identification

arXiv:2108.07787v18 citations
Originality Incremental advance
AI Analysis

This work improves dialect identification for speech processing applications, but it is incremental as it builds on existing TDNN-based methods.

The paper tackles the problem of dialect identification by addressing the neglect of subtle variants in different feature scales in Time Delay Neural Networks, proposing a dynamic multi-scale convolution architecture that significantly outperforms state-of-the-art systems with a Cavg of 0.067 and EER of 6.52%, achieving 9% and 45% relative improvements respectively.

Time Delay Neural Networks (TDNN)-based methods are widely used in dialect identification. However, in previous work with TDNN application, subtle variant is being neglected in different feature scales. To address this issue, we propose a new architecture, named dynamic multi-scale convolution, which consists of dynamic kernel convolution, local multi-scale learning, and global multi-scale pooling. Dynamic kernel convolution captures features between short-term and long-term context adaptively. Local multi-scale learning, which represents multi-scale features at a granular level, is able to increase the range of receptive fields for convolution operation. Besides, global multi-scale pooling is applied to aggregate features from different bottleneck layers in order to collect information from multiple aspects. The proposed architecture significantly outperforms state-of-the-art system on the AP20-OLR-dialect-task of oriental language recognition (OLR) challenge 2020, with the best average cost performance (Cavg) of 0.067 and the best equal error rate (EER) of 6.52%. Compared with the known best results, our method achieves 9% of Cavg and 45% of EER relative improvement, respectively. Furthermore, the parameters of proposed model are 91% fewer than the best known model.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes