Labeled Morphological Segmentation with Semi-Markov Models
This work addresses morphological processing tasks for computational linguistics, offering incremental improvements in segmentation, stemming, and tag classification.
The paper tackled morphological segmentation by introducing a unified framework and a new tagset hierarchy, resulting in absolute F1 improvements of 2-6 points over baselines across six languages.
We present labeled morphological segmentation, an alternative view of morphological processing that unifies several tasks. From an annotation standpoint, we additionally introduce a new hierarchy of morphotactic tagsets. Finally, we develop \modelname, a discriminative morphological segmentation system that, contrary to previous work, explicitly models morphotactics. We show that \textsc{chipmunk} yields improved performance on three tasks for all six languages: (i) morphological segmentation, (ii) stemming and (iii) morphological tag classification. On morphological segmentation, our method shows absolute improvements of 2--6 points $F_1$ over the baseline.