CLApr 19, 2021

No comments: Addressing commentary sections in websites' analyses

arXiv:2104.09113v1
AI Analysis

This addresses a specific issue for researchers analyzing web content, but it is incremental as it focuses on a niche aspect of data preprocessing.

The paper tackles the problem of commentary sections in websites causing biases in content analysis, showing that these sections can significantly skew results, especially for controversial topics like anti-vaccine websites, and provides guidelines for their removal or extraction.

Removing or extracting the commentary sections from a series of websites is a tedious task, as no standard way to code them is widely adopted. This operation is thus very rarely performed. In this paper, we show that these commentary sections can induce significant biases in the analyses, especially in the case of controversial Highlights $\bullet$ Commentary sections can induce biases in the analysis of websites' contents $\bullet$ Analyzing these sections can be interesting per se. $\bullet$ We illustrate these points using a corpus of anti-vaccine websites. $\bullet$ We provide guidelines to remove or extract these sections.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes