Restoring Hebrew Diacritics Without a Dictionary
This addresses the challenge of Hebrew text processing for linguists and NLP applications by providing a resource-light solution, though it is incremental as it builds on existing LSTM methods.
The paper tackled the problem of diacritizing Hebrew script without human-curated resources, achieving performance on par with more complex curation-dependent systems across diverse modern Hebrew sources.
We demonstrate that it is feasible to diacritize Hebrew script without any human-curated resources other than plain diacritized text. We present NAKDIMON, a two-layer character level LSTM, that performs on par with much more complicated curation-dependent systems, across a diverse array of modern Hebrew sources.