CLMay 5, 2017

Building Morphological Chains for Agglutinative Languages

arXiv:1705.02314v12 citations
Originality Incremental advance
AI Analysis

This work addresses morphological analysis for languages like Turkish, offering incremental improvements over existing unsupervised segmentation systems.

The paper tackles the problem of morphological segmentation for agglutinative languages by extending an unsupervised log-linear model to recursively expand candidate generation, resulting in a 12% improvement in Turkish F-measure to 72% and a 3% improvement in English to 74%.

In this paper, we build morphological chains for agglutinative languages by using a log-linear model for the morphological segmentation task. The model is based on the unsupervised morphological segmentation system called MorphoChains. We extend MorphoChains log linear model by expanding the candidate space recursively to cover more split points for agglutinative languages such as Turkish, whereas in the original model candidates are generated by considering only binary segmentation of each word. The results show that we improve the state-of-art Turkish scores by 12% having a F-measure of 72% and we improve the English scores by 3% having a F-measure of 74%. Eventually, the system outperforms both MorphoChains and other well-known unsupervised morphological segmentation systems. The results indicate that candidate generation plays an important role in such an unsupervised log-linear model that is learned using contrastive estimation with negative samples.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes