Regularized Contrastive Learning of Semantic Search
This work addresses the need for improved retrieval models in semantic search, particularly for handling long queries and indices, though it is incremental as it builds on existing transformer-based methods and regularization techniques.
The paper tackled the problem of learning better sentence representations for semantic search by proposing Regularized Contrastive Learning, a new regularization method that uses augmented semantic representations as regulators in contrastive objectives to overcome overfitting and anisotropy, and demonstrated its effectiveness by outperforming baselines on 7 semantic search benchmarks and 2 challenging FAQ datasets.
Semantic search is an important task which objective is to find the relevant index from a database for query. It requires a retrieval model that can properly learn the semantics of sentences. Transformer-based models are widely used as retrieval models due to their excellent ability to learn semantic representations. in the meantime, many regularization methods suitable for them have also been proposed. In this paper, we propose a new regularization method: Regularized Contrastive Learning, which can help transformer-based models to learn a better representation of sentences. It firstly augments several different semantic representations for every sentence, then take them into the contrastive objective as regulators. These contrastive regulators can overcome overfitting issues and alleviate the anisotropic problem. We firstly evaluate our approach on 7 semantic search benchmarks with the outperforming pre-trained model SRoBERTA. The results show that our method is more effective for learning a superior sentence representation. Then we evaluate our approach on 2 challenging FAQ datasets, Cough and Faqir, which have long query and index. The results of our experiments demonstrate that our method outperforms baseline methods.