CLSep 18, 2019

Hierarchical Meta-Embeddings for Code-Switching Named Entity Recognition

arXiv:1909.08504v11014 citations
Originality Incremental advance
AI Analysis

This addresses the problem of code-switching in multilingual countries for NLP applications, representing an incremental improvement over previous word-level approaches.

The paper tackles Named Entity Recognition for English-Spanish code-switching by proposing Hierarchical Meta-Embeddings (HME) that combine monolingual word-level and subword-level embeddings, achieving state-of-the-art performance in multilingual settings.

In countries that speak multiple main languages, mixing up different languages within a conversation is commonly called code-switching. Previous works addressing this challenge mainly focused on word-level aspects such as word embeddings. However, in many cases, languages share common subwords, especially for closely related languages, but also for languages that are seemingly irrelevant. Therefore, we propose Hierarchical Meta-Embeddings (HME) that learn to combine multiple monolingual word-level and subword-level embeddings to create language-agnostic lexical representations. On the task of Named Entity Recognition for English-Spanish code-switching data, our model achieves the state-of-the-art performance in the multilingual settings. We also show that, in cross-lingual settings, our model not only leverages closely related languages, but also learns from languages with different roots. Finally, we show that combining different subunits are crucial for capturing code-switching entities.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes