CVAILGJun 10, 2024

DiffInject: Revisiting Debias via Synthetic Data Generation using Diffusion-based Style Injection

arXiv:2406.06134v11 citations
Originality Incremental advance
AI Analysis

This addresses dataset bias issues for machine learning practitioners, offering an unsupervised approach, though it appears incremental as it builds on existing generative debiasing methods.

The paper tackles dataset bias in machine learning by proposing DiffInject, a method that uses a pretrained diffusion model to generate synthetic bias-conflict samples for debiasing, achieving substantial results in reducing bias without requiring explicit bias knowledge or labeling.

Dataset bias is a significant challenge in machine learning, where specific attributes, such as texture or color of the images are unintentionally learned resulting in detrimental performance. To address this, previous efforts have focused on debiasing models either by developing novel debiasing algorithms or by generating synthetic data to mitigate the prevalent dataset biases. However, generative approaches to date have largely relied on using bias-specific samples from the dataset, which are typically too scarce. In this work, we propose, DiffInject, a straightforward yet powerful method to augment synthetic bias-conflict samples using a pretrained diffusion model. This approach significantly advances the use of diffusion models for debiasing purposes by manipulating the latent space. Our framework does not require any explicit knowledge of the bias types or labelling, making it a fully unsupervised setting for debiasing. Our methodology demonstrates substantial result in effectively reducing dataset bias.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes