CLFeb 27, 2019

CN-Probase: A Data-driven Approach for Large-scale Chinese Taxonomy Construction

arXiv:1902.10326v133 citations
Originality Synthesis-oriented
AI Analysis

This addresses the problem of limited non-English taxonomies for Chinese language processing, though it is incremental as it adapts existing methods to a new domain.

The paper tackled the scarcity of large-scale Chinese taxonomies by proposing a data-driven framework to automatically construct one, resulting in CN-Probase with high precision of about 95% and deployment on Aliyun with over 82 million API calls in six months.

Taxonomies play an important role in machine intelligence. However, most well-known taxonomies are in English, and non-English taxonomies, especially Chinese ones, are still very rare. In this paper, we focus on automatic Chinese taxonomy construction and propose an effective generation and verification framework to build a large-scale and high-quality Chinese taxonomy. In the generation module, we extract isA relations from multiple sources of Chinese encyclopedia, which ensures the coverage. To further improve the precision of taxonomy, we apply three heuristic approaches in verification module. As a result, we construct the largest Chinese taxonomy with high precision about 95% called CN-Probase. Our taxonomy has been deployed on Aliyun, with over 82 million API calls in six months.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes