CLOct 24, 2023

Contrastive Learning-based Sentence Encoders Implicitly Weight Informative Words

Hiroto Kurita, Goro Kobayashi, Sho Yokoi, Kentaro Inui

arXiv:2310.15921v121.4134 citationsh-index: 11Has Code

Originality Incremental advance

AI Analysis

This provides insights into the mechanisms of contrastive learning for NLP practitioners, though it is incremental as it explains existing methods rather than introducing new ones.

The paper investigates how contrastive learning improves sentence encoders by showing that they implicitly weight words based on informativeness, with more informative words receiving higher weights, supported by theoretical analysis and experiments across models and datasets.

The performance of sentence encoders can be significantly improved through the simple practice of fine-tuning using contrastive loss. A natural question arises: what characteristics do models acquire during contrastive learning? This paper theoretically and experimentally shows that contrastive-based sentence encoders implicitly weight words based on information-theoretic quantities; that is, more informative words receive greater weight, while others receive less. The theory states that, in the lower bound of the optimal value of the contrastive learning objective, the norm of word embedding reflects the information gain associated with the distribution of surrounding words. We also conduct comprehensive experiments using various models, multiple datasets, two methods to measure the implicit weighting of models (Integrated Gradients and SHAP), and two information-theoretic quantities (information gain and self-information). The results provide empirical evidence that contrastive fine-tuning emphasizes informative words.

View on arXiv PDF Code

Similar