CLJan 23, 2024

Contrastive Learning in Distilled Models

arXiv:2401.12472v11 citationsh-index: 1
Originality Incremental advance
AI Analysis

This addresses the need for lightweight, high-performance models for edge NLP applications, though it is incremental as it builds on existing methods like SimCSE and DistilBERT.

The paper tackled the problem of improving Semantic Textual Similarity performance and reducing model size for NLP applications by applying contrastive learning to a distilled model, resulting in DistilFace achieving a 34.2% improvement over BERT base with a Spearman's correlation of 72.1 on STS tasks.

Natural Language Processing models like BERT can provide state-of-the-art word embeddings for downstream NLP tasks. However, these models yet to perform well on Semantic Textual Similarity, and may be too large to be deployed as lightweight edge applications. We seek to apply a suitable contrastive learning method based on the SimCSE paper, to a model architecture adapted from a knowledge distillation based model, DistilBERT, to address these two issues. Our final lightweight model DistilFace achieves an average of 72.1 in Spearman's correlation on STS tasks, a 34.2 percent improvement over BERT base.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes