CLJan 30, 2024

D-Nikud: Enhancing Hebrew Diacritization with LSTM and Pretrained Models

arXiv:2402.00075v11.91 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses diacritization for Hebrew language processing, but it appears incremental as it builds on existing methodologies like Nakdimon and TavBERT.

The paper tackled Hebrew diacritization by integrating LSTM networks and a BERT-based pre-trained model, achieving state-of-the-art results on benchmark datasets with a focus on modern texts and gender-specific diacritization.

D-Nikud, a novel approach to Hebrew diacritization that integrates the strengths of LSTM networks and BERT-based (transformer) pre-trained model. Inspired by the methodologies employed in Nakdimon, we integrate it with the TavBERT pre-trained model, our system incorporates advanced architectural choices and diverse training data. Our experiments showcase state-of-the-art results on several benchmark datasets, with a particular emphasis on modern texts and more specified diacritization like gender.

View on arXiv PDF Code

Similar