CLSep 23, 2017

Long Short-Term Memory for Japanese Word Segmentation

arXiv:1709.08011v31091 citations
Originality Incremental advance
AI Analysis

It addresses a domain-specific problem for natural language processing in Japanese, but is incremental as it adapts existing methods from Chinese word segmentation.

This study tackled Japanese word segmentation by proposing an LSTM-based approach to handle orthographic variations and global context, achieving state-of-the-art accuracy on various Japanese corpora.

This study presents a Long Short-Term Memory (LSTM) neural network approach to Japanese word segmentation (JWS). Previous studies on Chinese word segmentation (CWS) succeeded in using recurrent neural networks such as LSTM and gated recurrent units (GRU). However, in contrast to Chinese, Japanese includes several character types, such as hiragana, katakana, and kanji, that produce orthographic variations and increase the difficulty of word segmentation. Additionally, it is important for JWS tasks to consider a global context, and yet traditional JWS approaches rely on local features. In order to address this problem, this study proposes employing an LSTM-based approach to JWS. The experimental results indicate that the proposed model achieves state-of-the-art accuracy with respect to various Japanese corpora.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes