Unified Adversarial Invariance
This work addresses fairness and robustness in AI by enabling models to ignore irrelevant or biased data factors, with incremental improvements over existing methods.
The paper tackles the problem of inducing invariance to nuisance and biasing factors in supervised neural networks without requiring nuisance annotations, achieving state-of-the-art performance in fairness settings.
We present a unified invariance framework for supervised neural networks that can induce independence to nuisance factors of data without using any nuisance annotations, but can additionally use labeled information about biasing factors to force their removal from the latent embedding for making fair predictions. Invariance to nuisance is achieved by learning a split representation of data through competitive training between the prediction task and a reconstruction task coupled with disentanglement, whereas that to biasing factors is brought about by penalizing the network if the latent embedding contains any information about them. We describe an adversarial instantiation of this framework and provide analysis of its working. Our model outperforms previous works at inducing invariance to nuisance factors without using any labeled information about such variables, and achieves state-of-the-art performance at learning independence to biasing factors in fairness settings.