Invariant Representations through Adversarial Forgetting
This addresses the challenge of removing biases and nuisances in AI models, which is crucial for fairness and robustness, though it appears incremental as it builds on existing adversarial training methods.
The paper tackles the problem of learning invariant representations in deep neural networks by inducing amnesia to unwanted data factors through adversarial forgetting, achieving state-of-the-art performance on diverse datasets and tasks.
We propose a novel approach to achieving invariance for deep neural networks in the form of inducing amnesia to unwanted factors of data through a new adversarial forgetting mechanism. We show that the forgetting mechanism serves as an information-bottleneck, which is manipulated by the adversarial training to learn invariance to unwanted factors. Empirical results show that the proposed framework achieves state-of-the-art performance at learning invariance in both nuisance and bias settings on a diverse collection of datasets and tasks.