Character-Level Chinese Dependency Parsing via Modeling Latent Intra-Word Structure
This addresses the problem of syntactic parsing in Chinese for NLP applications, representing an incremental improvement over existing methods.
The paper tackled the challenge of Chinese dependency parsing without clear word boundaries by modeling latent intra-word structures at the character level, resulting in superior performance over pipeline and joint models in experiments on Chinese treebanks.
Revealing the syntactic structure of sentences in Chinese poses significant challenges for word-level parsers due to the absence of clear word boundaries. To facilitate a transition from word-level to character-level Chinese dependency parsing, this paper proposes modeling latent internal structures within words. In this way, each word-level dependency tree is interpreted as a forest of character-level trees. A constrained Eisner algorithm is implemented to ensure the compatibility of character-level trees, guaranteeing a single root for intra-word structures and establishing inter-word dependencies between these roots. Experiments on Chinese treebanks demonstrate the superiority of our method over both the pipeline framework and previous joint models. A detailed analysis reveals that a coarse-to-fine parsing strategy empowers the model to predict more linguistically plausible intra-word structures.