Arabic Morphosyntactic Tagging and Dependency Parsing with Large Language Models
This addresses the challenge of explicit linguistic structure prediction in Arabic, a morphologically rich language, but is incremental as it applies existing LLM methods to new tasks.
The study evaluated instruction-tuned large language models on morphosyntactic tagging and dependency parsing for Standard Arabic, finding that proprietary models approach supervised baselines for tagging and become competitive with specialized parsers using retrieval-based in-context learning.
Large language models (LLMs) perform strongly on many NLP tasks, but their ability to produce explicit linguistic structure remains unclear. We evaluate instruction-tuned LLMs on two structured prediction tasks for Standard Arabic: morphosyntactic tagging and labeled dependency parsing. Arabic provides a challenging testbed due to its rich morphology and orthographic ambiguity, which create strong morphology-syntax interactions. We compare zero-shot prompting with retrieval-based in-context learning (ICL) using examples from Arabic treebanks. Results show that prompt design and demonstration selection strongly affect performance: proprietary models approach supervised baselines for feature-level tagging and become competitive with specialized dependency parsers. In raw-text settings, tokenization remains challenging, though retrieval-based ICL improves both parsing and tokenization. Our analysis highlights which aspects of Arabic morphosyntax and syntax LLMs capture reliably and which remain difficult.