CVAug 11, 2017

Acoustic Feature Learning via Deep Variational Canonical Correlation Analysis

arXiv:1708.04673v220 citations
AI Analysis

This work addresses acoustic feature learning for speaker-independent phonetic recognition, presenting an incremental improvement over previous methods.

The paper tackles acoustic feature learning using a non-acoustic modality during training but not at test time, applying deep variational canonical correlation analysis (VCCA) with extensions like improved priors and adversarial learning, and shows that VCCA-based methods improve speaker-independent phonetic recognition on the University of Wisconsin X-ray Microbeam Database.

We study the problem of acoustic feature learning in the setting where we have access to another (non-acoustic) modality for feature learning but not at test time. We use deep variational canonical correlation analysis (VCCA), a recently proposed deep generative method for multi-view representation learning. We also extend VCCA with improved latent variable priors and with adversarial learning. Compared to other techniques for multi-view feature learning, VCCA's advantages include an intuitive latent variable interpretation and a variational lower bound objective that can be trained end-to-end efficiently. We compare VCCA and its extensions with previous feature learning methods on the University of Wisconsin X-ray Microbeam Database, and show that VCCA-based feature learning improves over previous methods for speaker-independent phonetic recognition.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes