CLSep 24, 2019

Situating Sentence Embedders with Nearest Neighbor Overlap

arXiv:1909.10724v10.96 citations

Originality Synthesis-oriented

AI Analysis

This provides a comparative tool for researchers and practitioners in NLP to evaluate embedders more broadly, though it is incremental as it builds on existing embedding methods.

The paper tackled the problem of comparing sentence embedders without relying on benchmark tasks by proposing nearest neighbor overlap (N2O), a task-agnostic method that quantifies similarity based on overlap in nearest neighbors, and used it to analyze the effects of design choices and architectures.

As distributed approaches to natural language semantics have developed and diversified, embedders for linguistic units larger than words have come to play an increasingly important role. To date, such embedders have been evaluated using benchmark tasks (e.g., GLUE) and linguistic probes. We propose a comparative approach, nearest neighbor overlap (N2O), that quantifies similarity between embedders in a task-agnostic manner. N2O requires only a collection of examples and is simple to understand: two embedders are more similar if, for the same set of inputs, there is greater overlap between the inputs' nearest neighbors. Though applicable to embedders of texts of any size, we focus on sentence embedders and use N2O to show the effects of different design choices and architectures.

View on arXiv PDF

Similar