CL LG MLFeb 29, 2020

Understanding the Downstream Instability of Word Embeddings

Megan Leszczynski, Avner May, Jian Zhang, Sen Wu, Christopher R. Aberger, Christopher Ré

arXiv:2003.04983v11.316 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses instability in ML systems for NLP practitioners, offering a method to reduce retraining issues, though it is incremental as it builds on existing embedding techniques.

The paper investigates the instability of downstream NLP models caused by changes in pre-trained word embeddings, revealing a tradeoff where increasing embedding memory by 2x reduces prediction disagreement by 5% to 37% relative. It introduces an eigenspace instability measure to theoretically bound this effect and shows it can guide parameter selection to minimize instability without training downstream models.

Many industrial machine learning (ML) systems require frequent retraining to keep up-to-date with constantly changing data. This retraining exacerbates a large challenge facing ML systems today: model training is unstable, i.e., small changes in training data can cause significant changes in the model's predictions. In this paper, we work on developing a deeper understanding of this instability, with a focus on how a core building block of modern natural language processing (NLP) pipelines---pre-trained word embeddings---affects the instability of downstream NLP models. We first empirically reveal a tradeoff between stability and memory: increasing the embedding memory 2x can reduce the disagreement in predictions due to small changes in training data by 5% to 37% (relative). To theoretically explain this tradeoff, we introduce a new measure of embedding instability---the eigenspace instability measure---which we prove bounds the disagreement in downstream predictions introduced by the change in word embeddings. Practically, we show that the eigenspace instability measure can be a cost-effective way to choose embedding parameters to minimize instability without training downstream models, outperforming other embedding distance measures and performing competitively with a nearest neighbor-based measure. Finally, we demonstrate that the observed stability-memory tradeoffs extend to other types of embeddings as well, including knowledge graph and contextual word embeddings.

View on arXiv PDF Code

Similar