CLAug 22, 2018

A Characterwise Windowed Approach to Hebrew Morphological Segmentation

arXiv:1808.07214v21089 citations
AI Analysis

This work addresses Hebrew morphological segmentation for NLP applications, representing an incremental improvement over prior methods.

The paper tackles the problem of segmenting orthographic word forms in contemporary Hebrew without morphological analysis, achieving over 98% accuracy on benchmark data and 97% on a new out-of-domain dataset, improving state-of-the-art by about 4-5%.

This paper presents a novel approach to the segmentation of orthographic word forms in contemporary Hebrew, focusing purely on splitting without carrying out morphological analysis or disambiguation. Casting the analysis task as character-wise binary classification and using adjacent character and word-based lexicon-lookup features, this approach achieves over 98% accuracy on the benchmark SPMRL shared task data for Hebrew, and 97% accuracy on a new out of domain Wikipedia dataset, an improvement of ~4% and 5% over previous state of the art performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes