Adversarial Multi-Criteria Learning for Chinese Word Segmentation
This work addresses segmentation inconsistencies in Chinese NLP, offering a method to leverage multiple criteria for better performance, though it is incremental as it builds on existing multi-criteria approaches.
The paper tackles the problem of diverse segmentation criteria in Chinese word segmentation by proposing adversarial multi-criteria learning to integrate shared knowledge from multiple criteria, resulting in significant performance improvements on eight corpora compared to single-criterion learning.
Different linguistic perspectives causes many diverse segmentation criteria for Chinese word segmentation (CWS). Most existing methods focus on improve the performance for each single criterion. However, it is interesting to exploit these different criteria and mining their common underlying knowledge. In this paper, we propose adversarial multi-criteria learning for CWS by integrating shared knowledge from multiple heterogeneous segmentation criteria. Experiments on eight corpora with heterogeneous segmentation criteria show that the performance of each corpus obtains a significant improvement, compared to single-criterion learning. Source codes of this paper are available on Github.