CLJun 6, 2024

Character-Level Chinese Dependency Parsing via Modeling Latent Intra-Word Structure

arXiv:2406.03772v126 citations
Originality Incremental advance
AI Analysis

This addresses the problem of syntactic parsing in Chinese for NLP applications, representing an incremental improvement over existing methods.

The paper tackled the challenge of Chinese dependency parsing without clear word boundaries by modeling latent intra-word structures at the character level, resulting in superior performance over pipeline and joint models in experiments on Chinese treebanks.

Revealing the syntactic structure of sentences in Chinese poses significant challenges for word-level parsers due to the absence of clear word boundaries. To facilitate a transition from word-level to character-level Chinese dependency parsing, this paper proposes modeling latent internal structures within words. In this way, each word-level dependency tree is interpreted as a forest of character-level trees. A constrained Eisner algorithm is implemented to ensure the compatibility of character-level trees, guaranteeing a single root for intra-word structures and establishing inter-word dependencies between these roots. Experiments on Chinese treebanks demonstrate the superiority of our method over both the pipeline framework and previous joint models. A detailed analysis reveals that a coarse-to-fine parsing strategy empowers the model to predict more linguistically plausible intra-word structures.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes