Char-RNN and Active Learning for Hashtag Segmentation
This work addresses hashtag segmentation for natural language processing applications, but it is incremental as it builds on existing char-RNN and active learning methods.
The authors tackled hashtag segmentation by using a character recurrent neural network (char-RNN) with synthetic training data generated from frequent n-grams and morpho-syntactic patterns, and an active learning strategy to select informative subsets, achieving results without manual annotation or language-specific settings across two languages with different inflection degrees.
We explore the abilities of character recurrent neural network (char-RNN) for hashtag segmentation. Our approach to the task is the following: we generate synthetic training dataset according to frequent n-grams that satisfy predefined morpho-syntactic patterns to avoid any manual annotation. The active learning strategy limits the training dataset and selects informative training subset. The approach does not require any language-specific settings and is compared for two languages, which differ in inflection degree.