CLSep 14, 2017

Synapse at CAp 2017 NER challenge: Fasttext CRF

arXiv:1709.04820v13 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of named entity recognition in noisy, informal text like French tweets for NLP researchers, but it is incremental as it applies existing methods to a new domain.

The paper tackled named entity recognition on French tweets by using unsupervised learning on a larger dataset to generate features for a CRF model, achieving first place in the CAp 2017 challenge with an F-measure of 58.89%.

We present our system for the CAp 2017 NER challenge which is about named entity recognition on French tweets. Our system leverages unsupervised learning on a larger dataset of French tweets to learn features feeding a CRF model. It was ranked first without using any gazetteer or structured external data, with an F-measure of 58.89\%. To the best of our knowledge, it is the first system to use fasttext embeddings (which include subword representations) and an embedding-based sentence representation for NER.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes