IRMay 13, 2017

Talking to Your TV: Context-Aware Voice Search with Hierarchical Recurrent Neural Networks

arXiv:1705.04892v118 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of short and ambiguous voice queries for TV viewers, but it is incremental as it builds on existing neural network methods for a specific domain.

The paper tackles the problem of navigational voice queries in entertainment systems by integrating word- and character-level representations and modeling contextual dependencies in query sequences, resulting in significant performance improvements over models without context and the current deployed product.

We tackle the novel problem of navigational voice queries posed against an entertainment system, where viewers interact with a voice-enabled remote controller to specify the program to watch. This is a difficult problem for several reasons: such queries are short, even shorter than comparable voice queries in other domains, which offers fewer opportunities for deciphering user intent. Furthermore, ambiguity is exacerbated by underlying speech recognition errors. We address these challenges by integrating word- and character-level representations of the queries and by modeling voice search sessions to capture the contextual dependencies in query sequences. Both are accomplished with a probabilistic framework in which recurrent and feedforward neural network modules are organized in a hierarchical manner. From a raw dataset of 32M voice queries from 2.5M viewers on the Comcast Xfinity X1 entertainment system, we extracted data to train and test our models. We demonstrate the benefits of our hybrid representation and context-aware model, which significantly outperforms models without context as well as the current deployed product.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes