CLLGSep 6, 2020

Romanian Diacritics Restoration Using Recurrent Neural Networks

arXiv:2009.02743v13 citations
Originality Incremental advance
AI Analysis

This work addresses a mandatory preprocessing step for Romanian language processing, offering a language-specific solution where previous neural methods were not optimized for Romanian.

The paper tackles the problem of diacritics restoration in Romanian texts by proposing a novel recurrent neural network architecture that attends to different levels of abstraction, achieving improved accuracy in restoring diacritics.

Diacritics restoration is a mandatory step for adequately processing Romanian texts, and not a trivial one, as you generally need context in order to properly restore a character. Most previous methods which were experimented for Romanian restoration of diacritics do not use neural networks. Among those that do, there are no solutions specifically optimized for this particular language (i.e., they were generally designed to work on many different languages). Therefore we propose a novel neural architecture based on recurrent neural networks that can attend information at different levels of abstractions in order to restore diacritics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes