Semantic similarity prediction is better than other semantic similarity measures
This work addresses the need for more accurate semantic similarity measures in natural language processing, though it appears incremental as it builds on existing fine-tuning techniques.
The paper tackles the problem of measuring semantic similarity between texts by proposing a fine-tuned model approach, showing that it outperforms existing methods like BLEU and BERTScore on the STS-B benchmark.
Semantic similarity between natural language texts is typically measured either by looking at the overlap between subsequences (e.g., BLEU) or by using embeddings (e.g., BERTScore, S-BERT). Within this paper, we argue that when we are only interested in measuring the semantic similarity, it is better to directly predict the similarity using a fine-tuned model for such a task. Using a fine-tuned model for the Semantic Textual Similarity Benchmark tasks (STS-B) from the GLUE benchmark, we define the STSScore approach and show that the resulting similarity is better aligned with our expectations on a robust semantic similarity measure than other approaches.