CLAug 21, 2018

Language Identification in Code-Mixed Data using Multichannel Neural Networks and Context Capture

arXiv:1808.07118v11092 citations
Originality Incremental advance
AI Analysis

This work addresses the need for accurate language identification tools for building NLP systems on code-mixed data, representing an incremental improvement in a domain-specific area.

The paper tackles language identification in code-mixed data by implementing multichannel neural networks combining CNN and LSTM with a Bi-LSTM-CRF context capture module, achieving accuracies of 93.28% and 93.32% on two testing sets.

An accurate language identification tool is an absolute necessity for building complex NLP systems to be used on code-mixed data. Lot of work has been recently done on the same, but there's still room for improvement. Inspired from the recent advancements in neural network architectures for computer vision tasks, we have implemented multichannel neural networks combining CNN and LSTM for word level language identification of code-mixed data. Combining this with a Bi-LSTM-CRF context capture module, accuracies of 93.28% and 93.32% is achieved on our two testing sets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes