ConveRT for FAQ Answering
This addresses the challenge of building efficient FAQ chatbots for non-English languages, which is incremental as it adapts an existing method to a new domain.
The paper tackles the problem of adapting English conversational retrieval models to other languages with limited training data, specifically applying a novel pre-training procedure to ConveRT for Dutch FAQ answering on COVID-19 vaccine topics, showing it outperforms an open-source alternative in both low- and high-data regimes.
Knowledgeable FAQ chatbots are a valuable resource to any organization. While powerful and efficient retrieval-based models exist for English, it is rarely the case for other languages for which the same amount of training data is not available. In this paper, we propose a novel pre-training procedure to adapt ConveRT, an English conversational retriever model, to other languages with less training data available. We apply it for the first time to the task of Dutch FAQ answering related to the COVID-19 vaccine. We show it performs better than an open-source alternative in both a low-data regime and a high-data regime.