CLLGJul 4, 2022

Multilingual Disinformation Detection for Digital Advertising

arXiv:2207.10649v1h-index: 7
Originality Synthesis-oriented
AI Analysis

This addresses the issue of disinformation in digital advertising for independent publishers and advertisement providers, though it is an incremental step as it builds on existing methods for detection.

The paper tackles the problem of detecting disinformation websites in digital advertising by building a multilingual machine learning model that identifies topics of interest and estimates malicious content likelihood, creating a shortlist for human review to proactively blacklist unsafe content.

In today's world, the presence of online disinformation and propaganda is more widespread than ever. Independent publishers are funded mostly via digital advertising, which is unfortunately also the case for those publishing disinformation content. The question of how to remove such publishers from advertising inventory has long been ignored, despite the negative impact on the open internet. In this work, we make the first step towards quickly detecting and red-flagging websites that potentially manipulate the public with disinformation. We build a machine learning model based on multilingual text embeddings that first determines whether the page mentions a topic of interest, then estimates the likelihood of the content being malicious, creating a shortlist of publishers that will be reviewed by human experts. Our system empowers internal teams to proactively, rather than defensively, blacklist unsafe content, thus protecting the reputation of the advertisement provider.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes