CVCYMay 17, 2023

Rethinking Multimodal Content Moderation from an Asymmetric Angle with Mixed-modality

arXiv:2305.10547v322 citations
Originality Highly original
AI Analysis

This work addresses the need for effective moderation of harmful multimodal content on social media, which is an incremental improvement over existing methods.

The paper tackles the problem of multimodal content moderation by proposing a novel model, AM3, which addresses semantic asymmetry between vision and language through an asymmetric fusion architecture and cross-modality contrastive loss, achieving state-of-the-art performance on multimodal and unimodal benchmarks.

There is a rapidly growing need for multimodal content moderation (CM) as more and more content on social media is multimodal in nature. Existing unimodal CM systems may fail to catch harmful content that crosses modalities (e.g., memes or videos), which may lead to severe consequences. In this paper, we present a novel CM model, Asymmetric Mixed-Modal Moderation (AM3), to target multimodal and unimodal CM tasks. Specifically, to address the asymmetry in semantics between vision and language, AM3 has a novel asymmetric fusion architecture that is designed to not only fuse the common knowledge in both modalities but also to exploit the unique information in each modality. Unlike previous works that focus on representing the two modalities into a similar feature space while overlooking the intrinsic difference between the information conveyed in multimodality and in unimodality (asymmetry in modalities), we propose a novel cross-modality contrastive loss to learn the unique knowledge that only appears in multimodality. This is critical as some harmful intent may only be conveyed through the intersection of both modalities. With extensive experiments, we show that AM3 outperforms all existing state-of-the-art methods on both multimodal and unimodal CM benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes