CLSep 7, 2022

That Slepen Al the Nyght with Open Ye! Cross-era Sequence Segmentation with Switch-memory

arXiv:2209.02967v1637 citationsh-index: 17
Originality Incremental advance
AI Analysis

This addresses the challenge of processing texts from various historical periods in Chinese NLP, which is incremental as it builds on existing segmentation methods by adding cross-era capabilities.

The paper tackles the problem of Chinese word segmentation across different historical eras by proposing a cross-era learning framework called CROSSWISE, which uses a Switch-memory module to incorporate era-specific knowledge, resulting in significant performance improvements on four corpora.

The evolution of language follows the rule of gradual change. Grammar, vocabulary, and lexical semantic shifts take place over time, resulting in a diachronic linguistic gap. As such, a considerable amount of texts are written in languages of different eras, which creates obstacles for natural language processing tasks, such as word segmentation and machine translation. Although the Chinese language has a long history, previous Chinese natural language processing research has primarily focused on tasks within a specific era. Therefore, we propose a cross-era learning framework for Chinese word segmentation (CWS), CROSSWISE, which uses the Switch-memory (SM) module to incorporate era-specific linguistic knowledge. Experiments on four corpora from different eras show that the performance of each corpus significantly improves. Further analyses also demonstrate that the SM can effectively integrate the knowledge of the eras into the neural network.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes