CLSDNov 10, 2016

Landmark-based consonant voicing detection on multilingual corpora

arXiv:1611.03533v16 citations
Originality Incremental advance
AI Analysis

This addresses the problem of speech processing generalization across languages for researchers, but it is incremental as it builds on existing landmark and feature methods.

The paper tackled cross-lingual transfer of consonant voicing detection by testing classifiers on English, Turkish, and Spanish data, finding that CNN features outperformed manual and MFCC-based methods, with CNN features showing better performance in Turkish and Spanish than in English.

This paper tests the hypothesis that distinctive feature classifiers anchored at phonetic landmarks can be transferred cross-lingually without loss of accuracy. Three consonant voicing classifiers were developed: (1) manually selected acoustic features anchored at a phonetic landmark, (2) MFCCs (either averaged across the segment or anchored at the landmark), and(3) acoustic features computed using a convolutional neural network (CNN). All detectors are trained on English data (TIMIT),and tested on English, Turkish, and Spanish (performance measured using F1 and accuracy). Experiments demonstrate that manual features outperform all MFCC classifiers, while CNNfeatures outperform both. MFCC-based classifiers suffer an F1reduction of 16% absolute when generalized from English to other languages. Manual features suffer only a 5% F1 reduction,and CNN features actually perform better in Turkish and Span-ish than in the training language, demonstrating that features capable of representing long-term spectral dynamics (CNN and landmark-based features) are able to generalize cross-lingually with little or no loss of accuracy

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes