CLJun 4, 2019

Pitfalls in the Evaluation of Sentence Embeddings

arXiv:1906.01575v11095 citations
Originality Synthesis-oriented
AI Analysis

This work addresses evaluation challenges for researchers in NLP, though it is incremental as it compiles existing issues rather than introducing new methods.

The paper identifies key pitfalls in evaluating sentence embeddings, such as comparing embeddings of different sizes and low correlations between tasks, and recommends improved evaluation practices.

Deep learning models continuously break new records across different NLP tasks. At the same time, their success exposes weaknesses of model evaluation. Here, we compile several key pitfalls of evaluation of sentence embeddings, a currently very popular NLP paradigm. These pitfalls include the comparison of embeddings of different sizes, normalization of embeddings, and the low (and diverging) correlations between transfer and probing tasks. Our motivation is to challenge the current evaluation of sentence embeddings and to provide an easy-to-access reference for future research. Based on our insights, we also recommend better practices for better future evaluations of sentence embeddings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes