CLMay 11, 2021

Restoring Hebrew Diacritics Without a Dictionary

arXiv:2105.05209v430.2630 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the challenge of Hebrew text processing for linguists and NLP applications by providing a resource-light solution, though it is incremental as it builds on existing LSTM methods.

The paper tackled the problem of diacritizing Hebrew script without human-curated resources, achieving performance on par with more complex curation-dependent systems across diverse modern Hebrew sources.

We demonstrate that it is feasible to diacritize Hebrew script without any human-curated resources other than plain diacritized text. We present NAKDIMON, a two-layer character level LSTM, that performs on par with much more complicated curation-dependent systems, across a diverse array of modern Hebrew sources.

View on arXiv PDF Code

Similar