CLJun 15, 2022

KE-QI: A Knowledge Enhanced Article Quality Identification Dataset

arXiv:2206.07556v3h-index: 20
Originality Incremental advance
AI Analysis

This addresses the need to screen outstanding articles for social media, but is incremental as it introduces a new dataset and model for a specific domain.

The authors tackled the problem of identifying high-quality articles by creating a new dataset (KE-QI) with 10k articles annotated using 7 objective indicators, and proposed a compound model that fuses text and external knowledge, achieving about 78% F1 score and outperforming baselines.

With so many articles of varying qualities being produced every moment, it is a very urgent task to screen outstanding articles and commit them to social media. To our best knowledge, there is a lack of datasets and mature research works in identifying high-quality articles. Consequently, we conduct some surveys and finalize 7 objective indicators to annotate the quality of 10k articles. During annotation, we find that many characteristics of high-quality articles (e.g., background) rely more on extensive external knowledge than inner semantic information of articles. In response, we link extracted article entities to Baidu Encyclopedia, then propose Knowledge Enhanced article Quality Identification (KE-QI) dataset. To make better use of external knowledge, we propose a compound model which fuses the text and external knowledge information via a gate unit to classify the quality of an article. Our experimental results on KE-QI show that with initialization of our pre-trained Node2Vec model, our model achieves about 78\% $F_1$, outperforming other baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes