CLLGAug 14, 2019

Raw-to-End Name Entity Recognition in Social Media

arXiv:1908.05344v13 citationsHas Code
AI Analysis

This addresses the issue of pre-processing errors affecting NER performance for noisy texts like tweets, offering a more robust solution.

The paper tackles the problem of named entity recognition in noisy social media texts by introducing Neural-Char-CRF, a raw-to-end framework that avoids pre-processing errors like tokenization, achieving state-of-the-art results on two public datasets.

Taking word sequences as the input, typical named entity recognition (NER) models neglect errors from pre-processing (e.g., tokenization). However, these errors can influence the model performance greatly, especially for noisy texts like tweets. Here, we introduce Neural-Char-CRF, a raw-to-end framework that is more robust to pre-processing errors. It takes raw character sequences as inputs and makes end-to-end predictions. Word embedding and contextualized representation models are further tailored to capture textual signals for each character instead of each word. Our model neither requires the conversion from character sequences to word sequences, nor assumes tokenizer can correctly detect all word boundaries. Moreover, we observe our model performance remains unchanged after replacing tokenization with string matching, which demonstrates its potential to be tokenization-free. Extensive experimental results on two public datasets demonstrate the superiority of our proposed method over the state of the art. The implementations and datasets are made available at: https://github.com/LiyuanLucasLiu/Raw-to-End.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes