End-to-End Open Vocabulary Keyword Search With Multilingual Neural Representations
This work addresses the need for more efficient keyword search systems in speech processing, though it is incremental as it builds on prior neural ASR-free approaches.
The paper tackles the problem of simplifying keyword search by proposing an ASR-free neural model that uses multilingual pretraining, achieving competitive performance and outperforming ASR-based systems for long and out-of-vocabulary queries.
Conventional keyword search systems operate on automatic speech recognition (ASR) outputs, which causes them to have a complex indexing and search pipeline. This has led to interest in ASR-free approaches to simplify the search procedure. We recently proposed a neural ASR-free keyword search model which achieves competitive performance while maintaining an efficient and simplified pipeline, where queries and documents are encoded with a pair of recurrent neural network encoders and the encodings are combined with a dot-product. In this article, we extend this work with multilingual pretraining and detailed analysis of the model. Our experiments show that the proposed multilingual training significantly improves the model performance and that despite not matching a strong ASR-based conventional keyword search system for short queries and queries comprising in-vocabulary words, the proposed model outperforms the ASR-based system for long queries and queries that do not appear in the training data.