CLMay 11, 2021

Restoring Hebrew Diacritics Without a Dictionary

arXiv:2105.05209v4630 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of Hebrew text processing for linguists and NLP applications by providing a resource-light solution, though it is incremental as it builds on existing LSTM methods.

The paper tackled the problem of diacritizing Hebrew script without human-curated resources, achieving performance on par with more complex curation-dependent systems across diverse modern Hebrew sources.

We demonstrate that it is feasible to diacritize Hebrew script without any human-curated resources other than plain diacritized text. We present NAKDIMON, a two-layer character level LSTM, that performs on par with much more complicated curation-dependent systems, across a diverse array of modern Hebrew sources.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes