IRCLApr 23, 2020

Distilling Knowledge for Fast Retrieval-based Chat-bots

arXiv:2004.11045v132 citations
AI Analysis

This work addresses the efficiency problem for developers of customer support and information-seeking conversational systems by providing a method to enhance chatbot response retrieval speed while maintaining accuracy, though it is incremental as it builds on existing distillation and encoder techniques.

The paper tackles the trade-off between accuracy and speed in retrieval-based chatbots by proposing a new cross-encoder architecture and using knowledge distillation to transfer its performance to a faster bi-encoder model, achieving improved bi-encoder results without inference-time costs as validated on three datasets.

Response retrieval is a subset of neural ranking in which a model selects a suitable response from a set of candidates given a conversation history. Retrieval-based chat-bots are typically employed in information seeking conversational systems such as customer support agents. In order to make pairwise comparisons between a conversation history and a candidate response, two approaches are common: cross-encoders performing full self-attention over the pair and bi-encoders encoding the pair separately. The former gives better prediction quality but is too slow for practical use. In this paper, we propose a new cross-encoder architecture and transfer knowledge from this model to a bi-encoder model using distillation. This effectively boosts bi-encoder performance at no cost during inference time. We perform a detailed analysis of this approach on three response retrieval datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes