CLJan 25

Distance-to-Distance Ratio: A Similarity Measure for Sentences Based on Rate of Change in LLM Embeddings

arXiv:2601.17705v1
Originality Incremental advance
AI Analysis

This addresses the problem of improving text similarity measurement for natural language processing applications, though it appears incremental as it builds on existing embedding methods.

The paper tackles the problem of measuring similarity between text embeddings in a way that aligns with human perception by introducing the distance-to-distance ratio (DDR), which measures the rate of change in similarity between pre- and post-context LLM embeddings. The result shows that DDR consistently provides finer discrimination between semantically similar and dissimilar texts in experiments with word perturbations, outperforming other similarity metrics.

A measure of similarity between text embeddings can be considered adequate only if it adheres to the human perception of similarity between texts. In this paper, we introduce the distance-to-distance ratio (DDR), a novel measure of similarity between LLM sentence embeddings. Inspired by Lipschitz continuity, DDR measures the rate of change in similarity between the pre-context word embeddings and the similarity between post-context LLM embeddings, thus measuring the semantic influence of context. We evaluate the performance of DDR in experiments designed as a series of perturbations applied to sentences drawn from a sentence dataset. For each sentence, we generate variants by replacing one, two, or three words with either synonyms, which constitute semantically similar text, or randomly chosen words, which constitute semantically dissimilar text. We compare the performance of DDR with other prevailing similarity metrics and demonstrate that DDR consistently provides finer discrimination between semantically similar and dissimilar texts, even under minimal, controlled edits.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes