CLAug 12, 2019

A Finnish News Corpus for Named Entity Recognition

arXiv:1908.04212v172 citations
AI Analysis

This provides a new dataset for Finnish NER research, but it is incremental as it applies existing methods to new data.

The authors introduced a manually annotated Finnish news corpus for named entity recognition, containing 953 articles with six entity classes, and reported baseline experiments using rule-based and deep learning systems on in-domain and out-of-domain test sets.

We present a corpus of Finnish news articles with a manually prepared named entity annotation. The corpus consists of 953 articles (193,742 word tokens) with six named entity classes (organization, location, person, product, event, and date). The articles are extracted from the archives of Digitoday, a Finnish online technology news source. The corpus is available for research purposes. We present baseline experiments on the corpus using a rule-based and two deep learning systems on two, in-domain and out-of-domain, test sets.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes