CLAug 10, 2016

Hierarchical Character-Word Models for Language Identification

arXiv:1608.03030v145 citations
Originality Incremental advance
AI Analysis

This addresses language identification challenges for social media analysis, but is incremental as it builds on existing hierarchical approaches.

The authors tackled language identification in short social media texts with unconventional spelling by introducing a hierarchical character-word model, achieving strong performance against baselines and enabling code-switching detection.

Social media messages' brevity and unconventional spelling pose a challenge to language identification. We introduce a hierarchical model that learns character and contextualized word-level representations for language identification. Our method performs well against strong base- lines, and can also reveal code-switching.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes