CVMay 9, 2025

Understanding and Mitigating Toxicity in Image-Text Pretraining Datasets: A Case Study on LLaVA

Karthik Reddy Kanjula, Surya Guthikonda, Nahid Alam, Shayekh Bin Islam

arXiv:2505.06356v16.21 citationsh-index: 24Has Code2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Originality Synthesis-oriented

AI Analysis

This addresses the problem of harmful biases in multimodal datasets for researchers and developers building responsible AI systems, though it is incremental as it focuses on a specific dataset.

The paper investigates the prevalence of toxic content in the LLaVA image-text pretraining dataset, proposing mitigation strategies that resulted in a refined dataset with 7,531 toxic image-text pairs removed.

Pretraining datasets are foundational to the development of multimodal models, yet they often have inherent biases and toxic content from the web-scale corpora they are sourced from. In this paper, we investigate the prevalence of toxicity in LLaVA image-text pretraining dataset, examining how harmful content manifests in different modalities. We present a comprehensive analysis of common toxicity categories and propose targeted mitigation strategies, resulting in the creation of a refined toxicity-mitigated dataset. This dataset removes 7,531 of toxic image-text pairs in the LLaVA pre-training dataset. We offer guidelines for implementing robust toxicity detection pipelines. Our findings underscore the need to actively identify and filter toxic content - such as hate speech, explicit imagery, and targeted harassment - to build more responsible and equitable multimodal systems. The toxicity-mitigated dataset is open source and is available for further research.

View on arXiv PDF

Similar