CLJul 2, 2018

Transparent, Efficient, and Robust Word Embedding Access with WOMBAT

arXiv:1807.00717v11088 citations
Originality Synthesis-oriented
AI Analysis

This tool addresses efficiency and usability issues for NLP practitioners, but it is incremental as it builds on existing embedding methods.

The authors tackled the problem of inefficient and cumbersome access to word embeddings in NLP research by developing WOMBAT, a Python tool that enables faster and more streamlined processing, achieving end-to-end evaluation of 8.7M embedding vectors on a SemEval task in under ten seconds on a standard notebook.

We present WOMBAT, a Python tool which supports NLP practitioners in accessing word embeddings from code. WOMBAT addresses common research problems, including unified access, scaling, and robust and reproducible preprocessing. Code that uses WOMBAT for accessing word embeddings is not only cleaner, more readable, and easier to reuse, but also much more efficient than code using standard in-memory methods: a Python script using WOMBAT for evaluating seven large word embedding collections (8.7M embedding vectors in total) on a simple SemEval sentence similarity task involving 250 raw sentence pairs completes in under ten seconds end-to-end on a standard notebook computer.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes