IRJun 12, 2014

Assessing the Quality of Web Content

arXiv:1406.3188v13 citations
Originality Synthesis-oriented
AI Analysis

This work addresses web content quality evaluation for researchers and practitioners, but it is incremental as it applies existing ensemble methods to a specific challenge.

The paper tackled the ECML/PKDD Discovery Challenge 2010 by developing an ensemble of classifiers for web content quality assessment, achieving second place with NDCG scores of 0.575 for genre classification, 0.852 for English quality, and 0.81 for French and 0.77 for German multilingual quality tasks.

This paper describes our approach towards the ECML/PKDD Discovery Challenge 2010. The challenge consists of three tasks: (1) a Web genre and facet classification task for English hosts, (2) an English quality task, and (3) a multilingual quality task (German and French). In our approach, we create an ensemble of three classifiers to predict unseen Web hosts whereas each classifier is trained on a different feature set. Our final NDCG on the whole test set is 0:575 for Task 1, 0:852 for Task 2, and 0:81 (French) and 0:77 (German) for Task 3, which ranks second place in the ECML/PKDD Discovery Challenge 2010.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes