A simple method for domain adaptation of sentence embeddings
This work addresses the challenge of over-adaptation in domain-specific NLP tasks, offering a more robust solution for practitioners, though it is incremental as it builds on existing embedding methods.
The paper tackles the problem of domain adaptation for pre-trained sentence embeddings by proposing a simple universal method using a Siamese architecture to finetune Google's Universal Sentence Encoder, showing improved results compared to traditional finetuning across various datasets.
Pre-trained sentence embeddings have been shown to be very useful for a variety of NLP tasks. Due to the fact that training such embeddings requires a large amount of data, they are commonly trained on a variety of text data. An adaptation to specific domains could improve results in many cases, but such a finetuning is usually problem-dependent and poses the risk of over-adapting to the data used for adaptation. In this paper, we present a simple universal method for finetuning Google's Universal Sentence Encoder (USE) using a Siamese architecture. We demonstrate how to use this approach for a variety of data sets and present results on different data sets representing similar problems. The approach is also compared to traditional finetuning on these data sets. As a further advantage, the approach can be used for combining data sets with different annotations. We also present an embedding finetuned on all data sets in parallel.