Bias Redistribution in Visual Machine Unlearning: Does Forgetting One Group Harm Another?
This work addresses fairness concerns in machine unlearning for privacy regulations, revealing a risk of amplifying bias in AI systems, though it is incremental in highlighting limitations of existing methods.
The study investigated whether machine unlearning redistributes bias to other demographic groups when forgetting a specific group, using CLIP models on CelebA with age and gender categories. Results showed that unlearning primarily redistributed bias along gender lines, especially from Young Female to Old Female, and current methods failed to eliminate bias without degrading performance.
Machine unlearning enables models to selectively forget training data, driven by privacy regulations such as GDPR and CCPA. However, its fairness implications remain underexplored: when a model forgets a demographic group, does it neutralize that concept or redistribute it to correlated groups, potentially amplifying bias? We investigate this bias redistribution phenomenon on CelebA using CLIP models (ViT/B-32, ViT-L/14, ViT-B/16) under a zero-shot classification setting across intersectional groups defined by age and gender. We evaluate three unlearning methods, Prompt Erasure, Prompt Reweighting, and Refusal Vector using per-group accuracy shifts, demographic parity gaps, and a redistribution score. Our results show that unlearning does not eliminate bias but redistributes it primarily along gender rather than age boundaries. In particular, removing the dominant Young Female group consistently transfers performance to Old Female across all model scales, revealing a gender-dominant structure in CLIP's embedding space. While the Refusal Vector method reduces redistribution, it fails to achieve complete forgetting and significantly degrades retained performance. These findings highlight a fundamental limitation of current unlearning methods: without accounting for embedding geometry, they risk amplifying bias in retained groups.