CVMay 9, 2025

Understanding and Mitigating Toxicity in Image-Text Pretraining Datasets: A Case Study on LLaVA

arXiv:2505.06356v11 citationsh-index: 3Has Code2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
Originality Synthesis-oriented
AI Analysis

This addresses the problem of harmful biases in multimodal datasets for researchers and developers building responsible AI systems, though it is incremental as it focuses on a specific dataset.

The paper investigates the prevalence of toxic content in the LLaVA image-text pretraining dataset, proposing mitigation strategies that resulted in a refined dataset with 7,531 toxic image-text pairs removed.

Pretraining datasets are foundational to the development of multimodal models, yet they often have inherent biases and toxic content from the web-scale corpora they are sourced from. In this paper, we investigate the prevalence of toxicity in LLaVA image-text pretraining dataset, examining how harmful content manifests in different modalities. We present a comprehensive analysis of common toxicity categories and propose targeted mitigation strategies, resulting in the creation of a refined toxicity-mitigated dataset. This dataset removes 7,531 of toxic image-text pairs in the LLaVA pre-training dataset. We offer guidelines for implementing robust toxicity detection pipelines. Our findings underscore the need to actively identify and filter toxic content - such as hate speech, explicit imagery, and targeted harassment - to build more responsible and equitable multimodal systems. The toxicity-mitigated dataset is open source and is available for further research.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes