Nadir El Manouzi

CLJul 4, 2022

Multilingual Disinformation Detection for Digital Advertising

Zofia Trstanova, Nadir El Manouzi, Maryline Chen et al.

In today's world, the presence of online disinformation and propaganda is more widespread than ever. Independent publishers are funded mostly via digital advertising, which is unfortunately also the case for those publishing disinformation content. The question of how to remove such publishers from advertising inventory has long been ignored, despite the negative impact on the open internet. In this work, we make the first step towards quickly detecting and red-flagging websites that potentially manipulate the public with disinformation. We build a machine learning model based on multilingual text embeddings that first determines whether the page mentions a topic of interest, then estimates the likelihood of the content being malicious, creating a shortlist of publishers that will be reviewed by human experts. Our system empowers internal teams to proactively, rather than defensively, blacklist unsafe content, thus protecting the reputation of the advertisement provider.

LGDec 18, 2025

Abacus: Self-Supervised Event Counting-Aligned Distributional Pretraining for Sequential User Modeling

Sullivan Castro, Artem Betlei, Thomas Di Martino et al.

Modeling user purchase behavior is a critical challenge in display advertising systems, necessary for real-time bidding. The difficulty arises from the sparsity of positive user events and the stochasticity of user actions, leading to severe class imbalance and irregular event timing. Predictive systems usually rely on hand-crafted "counter" features, overlooking the fine-grained temporal evolution of user intent. Meanwhile, current sequential models extract direct sequential signal, missing useful event-counting statistics. We enhance deep sequential models with self-supervised pretraining strategies for display advertising. Especially, we introduce Abacus, a novel approach of predicting the empirical frequency distribution of user events. We further propose a hybrid objective unifying Abacus with sequential learning objectives, combining stability of aggregated statistics with the sequence modeling sensitivity. Experiments on two real-world datasets show that Abacus pretraining outperforms existing methods accelerating downstream task convergence, while hybrid approach yields up to +6.1% AUC compared to the baselines.

Nadir El Manouzi

2 Papers