ASAILGSDSPNov 30, 2022

Preliminary Study on SSCF-derived Polar Coordinate for ASR

arXiv:2212.01245v11 citationsh-index: 18
Originality Incremental advance
AI Analysis

This work addresses speech recognition accuracy for mixed and cross-gender scenarios, but it is incremental as the representations are not much more gender-independent than conventional MFCCs.

The authors tackled the problem of improving Automatic Speech Recognition (ASR) by proposing polar coordinates derived from Spectral Subband Centroids to describe acoustic trajectories, achieving significantly higher accuracy than angle-based representations on the BRAF100 dataset, with further gains from using derivatives, especially in cross-female recognition.

The transition angles are defined to describe the vowel-to-vowel transitions in the acoustic space of the Spectral Subband Centroids, and the findings show that they are similar among speakers and speaking rates. In this paper, we propose to investigate the usage of polar coordinates in favor of angles to describe a speech signal by characterizing its acoustic trajectory and using them in Automatic Speech Recognition. According to the experimental results evaluated on the BRAF100 dataset, the polar coordinates achieved significantly higher accuracy than the angles in the mixed and cross-gender speech recognitions, demonstrating that these representations are superior at defining the acoustic trajectory of the speech signal. Furthermore, the accuracy was significantly improved when they were utilized with their first and second-order derivatives ($Δ$, $Δ$$Δ$), especially in cross-female recognition. However, the results showed they were not much more gender-independent than the conventional Mel-frequency Cepstral Coefficients (MFCCs).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes