CLAug 13, 2022

MetricBERT: Text Representation Learning via Self-Supervised Triplet Training

Itzik Malkiel, Dvir Ginzburg, Oren Barkan, Avi Caciularu, Yoni Weill, Noam Koenigstein

arXiv:2208.06610v11.615 citationsh-index: 28

Originality Incremental advance

AI Analysis

This work addresses the need for improved text similarity models in recommendation systems, though it appears incremental as it builds on BERT with a new training objective.

The authors tackled the problem of learning text representations for similarity-based recommendations by introducing MetricBERT, which combines a similarity metric with masked-language modeling, and demonstrated that it outperforms state-of-the-art alternatives, sometimes by a substantial margin.

We present MetricBERT, a BERT-based model that learns to embed text under a well-defined similarity metric while simultaneously adhering to the ``traditional'' masked-language task. We focus on downstream tasks of learning similarities for recommendations where we show that MetricBERT outperforms state-of-the-art alternatives, sometimes by a substantial margin. We conduct extensive evaluations of our method and its different variants, showing that our training objective is highly beneficial over a traditional contrastive loss, a standard cosine similarity objective, and six other baselines. As an additional contribution, we publish a dataset of video games descriptions along with a test set of similarity annotations crafted by a domain expert.

View on arXiv PDF

Similar