CLNov 14, 2016

Multi-view Recurrent Neural Acoustic Word Embeddings

arXiv:1611.04496v216.692 citations

Originality Incremental advance

AI Analysis

This work addresses speech retrieval and recognition tasks by providing better whole-word representations, though it is incremental as it builds on existing neural acoustic word embedding techniques.

The authors tackled the problem of learning acoustic word embeddings by proposing a multi-view approach that jointly embeds acoustic and character sequences using deep bidirectional LSTMs and contrastive losses, resulting in improved word discrimination performance over previous methods.

Recent work has begun exploring neural acoustic word embeddings---fixed-dimensional vector representations of arbitrary-length speech segments corresponding to words. Such embeddings are applicable to speech retrieval and recognition tasks, where reasoning about whole words may make it possible to avoid ambiguous sub-word representations. The main idea is to map acoustic sequences to fixed-dimensional vectors such that examples of the same word are mapped to similar vectors, while different-word examples are mapped to very different vectors. In this work we take a multi-view approach to learning acoustic word embeddings, in which we jointly learn to embed acoustic sequences and their corresponding character sequences. We use deep bidirectional LSTM embedding models and multi-view contrastive losses. We study the effect of different loss variants, including fixed-margin and cost-sensitive losses. Our acoustic word embeddings improve over previous approaches for the task of word discrimination. We also present results on other tasks that are enabled by the multi-view approach, including cross-view word discrimination and word similarity.

View on arXiv PDF

Similar