CLOct 25, 2016

Improving historical spelling normalization with bi-directional LSTMs and multi-task learning

arXiv:1610.07844v115.865 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of variant spellings in historical documents for natural-language processing, though it is incremental as it builds on existing normalization methods.

The paper tackled historical spelling normalization by applying a deep bi-LSTM network at the character level, achieving competitive performance compared to existing algorithms on Early New High German texts, with multi-task learning further improving results.

Natural-language processing of historical documents is complicated by the abundance of variant spellings and lack of annotated data. A common approach is to normalize the spelling of historical words to modern forms. We explore the suitability of a deep neural network architecture for this task, particularly a deep bi-LSTM network applied on a character level. Our model compares well to previously established normalization algorithms when evaluated on a diverse set of texts from Early New High German. We show that multi-task learning with additional normalization data can improve our model's performance further.

View on arXiv PDF

Similar