Extracting Similar Questions From Naturally-occurring Business Conversations
This addresses the challenge of extracting similar questions from business conversations for users like analysts or managers, but it is incremental as it builds on existing embedding models with specific tuning.
The paper tackled the problem of identifying semantically similar questions in real-world English business conversations, finding that off-the-shelf contextualized embedding models perform poorly due to narrow distribution in embedding space, and demonstrated a method using tuned representations and exemplars to group questions for data exploration or coaching.
Pre-trained contextualized embedding models such as BERT are a standard building block in many natural language processing systems. We demonstrate that the sentence-level representations produced by some off-the-shelf contextualized embedding models have a narrow distribution in the embedding space, and thus perform poorly for the task of identifying semantically similar questions in real-world English business conversations. We describe a method that uses appropriately tuned representations and a small set of exemplars to group questions of interest to business users in a visualization that can be used for data exploration or employee coaching.