CLAILGAug 19, 2024

Improving embedding with contrastive fine-tuning on small datasets with expert-augmented scores

arXiv:2408.11868v16 citationsh-index: 2
Originality Incremental advance
AI Analysis

This addresses text retrieval problems for real-world applications like online shopping Q&A, especially when labeled data is scarce, but it is incremental in nature.

The paper tackles the problem of improving text embedding models for semantic textual similarity and retrieval tasks by using contrastive fine-tuning on small datasets augmented with expert scores. The result shows improved performance over a benchmark model across multiple metrics on the MTEB benchmark.

This paper presents an approach to improve text embedding models through contrastive fine-tuning on small datasets augmented with expert scores. It focuses on enhancing semantic textual similarity tasks and addressing text retrieval problems. The proposed method uses soft labels derived from expert-augmented scores to fine-tune embedding models, preserving their versatility and ensuring retrieval capability is improved. The paper evaluates the method using a Q\&A dataset from an online shopping website and eight expert models. Results show improved performance over a benchmark model across multiple metrics on various retrieval tasks from the massive text embedding benchmark (MTEB). The method is cost-effective and practical for real-world applications, especially when labeled data is scarce.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes