CLJun 24, 2019

On the Definition of Japanese Word

arXiv:1906.09719v18 citations
Originality Synthesis-oriented
AI Analysis

This work tackles a foundational linguistic annotation problem for Japanese NLP researchers, but it is incremental as it builds on existing non-mainstream definitions without presenting new empirical results.

The paper addresses the unclear definition of syntactic words in Japanese for Universal Dependencies annotation, arguing that the current Short Unit Words do not meet the guidelines and exploring the feasibility of applying alternative linguistic definitions to corpus annotation.

The annotation guidelines for Universal Dependencies (UD) stipulate that the basic units of dependency annotation are syntactic words, but it is not clear what are syntactic words in Japanese. Departing from the long tradition of using phrasal units called bunsetsu for dependency parsing, the current UD Japanese treebanks adopt the Short Unit Words. However, we argue that they are not syntactic word as specified by the annotation guidelines. Although we find non-mainstream attempts to linguistically define Japanese words, such definitions have never been applied to corpus annotation. We discuss the costs and benefits of adopting the rather unfamiliar criteria.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes