CLFeb 20, 2024

Are ELECTRA's Sentence Embeddings Beyond Repair? The Case of Semantic Textual Similarity

arXiv:2402.13130v323 citationsh-index: 3EMNLP
Originality Incremental advance
AI Analysis

This addresses a specific bottleneck for NLP practitioners using ELECTRA for semantic tasks, though it appears incremental as it repairs an existing model rather than introducing a new paradigm.

The paper tackles the problem of ELECTRA's poor sentence embeddings for semantic textual similarity by proposing truncated model fine-tuning (TMFT), which improves Spearman correlation by over 8 points on the STS Benchmark while increasing parameter efficiency.

While BERT produces high-quality sentence embeddings, its pre-training computational cost is a significant drawback. In contrast, ELECTRA provides a cost-effective pre-training objective and downstream task performance improvements, but worse sentence embeddings. The community tacitly stopped utilizing ELECTRA's sentence embeddings for semantic textual similarity (STS). We notice a significant drop in performance for the ELECTRA discriminator's last layer in comparison to prior layers. We explore this drop and propose a way to repair the embeddings using a novel truncated model fine-tuning (TMFT) method. TMFT improves the Spearman correlation coefficient by over $8$ points while increasing parameter efficiency on the STS Benchmark. We extend our analysis to various model sizes, languages, and two other tasks. Further, we discover the surprising efficacy of ELECTRA's generator model, which performs on par with BERT, using significantly fewer parameters and a substantially smaller embedding size. Finally, we observe boosts by combining TMFT with word similarity or domain adaptive pre-training.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes