CRMar 30

Seeing the Unseen: Rethinking Illicit Promotion Detection with In-Context Learning

arXiv:2603.2804376.9h-index: 1

AI Analysis

This addresses the problem of evolving illicit content detection for online platforms by offering a proactive and adaptive moderation system, though it builds incrementally on existing ICL methods.

The paper tackles illicit online promotion detection by proposing In-Context Learning (ICL) as a unified framework, achieving performance comparable to fine-tuned models with 22x fewer labeled examples and demonstrating capabilities like generalization to unseen threats with less than 6% performance drop, autonomous discovery of undocumented categories, and cross-platform generalization with 92.6% accuracy.

Illicit online promotion is a persistent threat that evolves to evade detection. Existing moderation systems remain tethered to platform-specific supervision and static taxonomies, a reactive paradigm that struggles to generalize across domains or uncover novel threats. This paper presents a systematic study of In-Context Learning (ICL) as a unified framework for illicit promotion detection. Through rigorous analysis, we show that properly configured ICL achieves performance comparable to fine-tuned models using 22x fewer labeled examples. We demonstrate three key capabilities: (1) Generalization to unseen threats: ICL generalizes to new illicit categories without category-specific demonstrations, with a performance drop of less than 6% for most evaluated categories. (2) Autonomous discovery: A novel two-stage pipeline distills 2,900 free-form labels into coherent taxonomies, surfacing eight previously undocumented illicit categories such as usury and illegal immigration. (3) Cross-platform generalization: Deployed on 200,000 real-world samples from search engines and Twitter without adaptation, ICL achieves 92.6% accuracy. Furthermore, 61.8% of its uniquely flagged samples correspond to borderline or obfuscated content missed by existing detectors. Our findings position ICL as a new paradigm for content moderation, combining the precision of specialized classifiers with cross-platform generalization and autonomous threat discovery. By shifting to inference-time reasoning, ICL offers a path toward proactively adaptive moderation systems.

View on arXiv PDF

Similar