LGMLAug 12, 2020

Null-sampling for Interpretable and Fair Representations

arXiv:2008.05248v130 citations
Originality Incremental advance
AI Analysis

This work addresses algorithmic fairness by making model changes interpretable for human auditors, though it appears incremental as it builds on existing invariance and adversarial training methods.

The paper tackles the problem of learning invariant representations in the data domain to achieve interpretability and fairness, addressing a setup with strong bias where class labels are irrelevant and spurious correlations cannot be distinguished. It introduces an adversarially trained model with null-sampling, showing effectiveness on image and tabular datasets like Coloured MNIST, CelebA, and Adult.

We propose to learn invariant representations, in the data domain, to achieve interpretability in algorithmic fairness. Invariance implies a selectivity for high level, relevant correlations w.r.t. class label annotations, and a robustness to irrelevant correlations with protected characteristics such as race or gender. We introduce a non-trivial setup in which the training set exhibits a strong bias such that class label annotations are irrelevant and spurious correlations cannot be distinguished. To address this problem, we introduce an adversarially trained model with a null-sampling procedure to produce invariant representations in the data domain. To enable disentanglement, a partially-labelled representative set is used. By placing the representations into the data domain, the changes made by the model are easily examinable by human auditors. We show the effectiveness of our method on both image and tabular datasets: Coloured MNIST, the CelebA and the Adult dataset.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes