Fixing the Infix: Unsupervised Discovery of Root-and-Pattern Morphology
This addresses a long-standing challenge in natural language processing for Semitic languages, offering an unsupervised and language-agnostic solution.
The paper tackled the problem of unsupervised discovery of root-and-pattern morphology in Semitic languages, which had not been handled in prior approaches, and showed that their root extractor compares favorably with the widely used ISRI extractor.
We present an unsupervised and language-agnostic method for learning root-and-pattern morphology in Semitic languages. This form of morphology, abundant in Semitic languages, has not been handled in prior unsupervised approaches. We harness the syntactico-semantic information in distributed word representations to solve the long standing problem of root-and-pattern discovery in Semitic languages. Moreover, we construct an unsupervised root extractor based on the learned rules. We prove the validity of learned rules across Arabic, Hebrew, and Amharic, alongside showing that our root extractor compares favorably with a widely used, carefully engineered root extractor: ISRI.