A Characterwise Windowed Approach to Hebrew Morphological Segmentation
This work addresses Hebrew morphological segmentation for NLP applications, representing an incremental improvement over prior methods.
The paper tackles the problem of segmenting orthographic word forms in contemporary Hebrew without morphological analysis, achieving over 98% accuracy on benchmark data and 97% on a new out-of-domain dataset, improving state-of-the-art by about 4-5%.
This paper presents a novel approach to the segmentation of orthographic word forms in contemporary Hebrew, focusing purely on splitting without carrying out morphological analysis or disambiguation. Casting the analysis task as character-wise binary classification and using adjacent character and word-based lexicon-lookup features, this approach achieves over 98% accuracy on the benchmark SPMRL shared task data for Hebrew, and 97% accuracy on a new out of domain Wikipedia dataset, an improvement of ~4% and 5% over previous state of the art performance.