CLIRNov 8, 2019

Char-RNN and Active Learning for Hashtag Segmentation

arXiv:1911.03270v14 citations
Originality Synthesis-oriented
AI Analysis

This work addresses hashtag segmentation for natural language processing applications, but it is incremental as it builds on existing char-RNN and active learning methods.

The authors tackled hashtag segmentation by using a character recurrent neural network (char-RNN) with synthetic training data generated from frequent n-grams and morpho-syntactic patterns, and an active learning strategy to select informative subsets, achieving results without manual annotation or language-specific settings across two languages with different inflection degrees.

We explore the abilities of character recurrent neural network (char-RNN) for hashtag segmentation. Our approach to the task is the following: we generate synthetic training dataset according to frequent n-grams that satisfy predefined morpho-syntactic patterns to avoid any manual annotation. The active learning strategy limits the training dataset and selects informative training subset. The approach does not require any language-specific settings and is compared for two languages, which differ in inflection degree.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes