CLSep 30, 2020

Multiple Word Embeddings for Increased Diversity of Representation

arXiv:2009.14394v211 citationsHas Code
AI Analysis

This incremental improvement addresses efficiency and representation diversity for NLP practitioners using pre-trained models.

The paper tackles the high computational cost of contextual embeddings in NLP by concatenating multiple pre-trained embeddings, which substantially and consistently improves performance across many tasks and datasets with negligible runtime increase.

Most state-of-the-art models in natural language processing (NLP) are neural models built on top of large, pre-trained, contextual language models that generate representations of words in context and are fine-tuned for the task at hand. The improvements afforded by these "contextual embeddings" come with a high computational cost. In this work, we explore a simple technique that substantially and consistently improves performance over a strong baseline with negligible increase in run time. We concatenate multiple pre-trained embeddings to strengthen our representation of words. We show that this concatenation technique works across many tasks, datasets, and model types. We analyze aspects of pre-trained embedding similarity and vocabulary coverage and find that the representational diversity between different pre-trained embeddings is the driving force of why this technique works. We provide open source implementations of our models in both TensorFlow and PyTorch.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes