Fair Interpretable Learning via Correction Vectors
This addresses the need for interpretable fair learning methods for practitioners, though it is incremental as it builds on existing debiasing techniques.
The paper tackles the problem of opaque fair representation learning methods by proposing a framework using correction vectors that are added to original features, enabling interpretability. They show experimentally that this constraint does not impact performance.
Neural network architectures have been extensively employed in the fair representation learning setting, where the objective is to learn a new representation for a given vector which is independent of sensitive information. Various "representation debiasing" techniques have been proposed in the literature. However, as neural networks are inherently opaque, these methods are hard to comprehend, which limits their usefulness. We propose a new framework for fair representation learning which is centered around the learning of "correction vectors", which have the same dimensionality as the given data vectors. The corrections are then simply summed up to the original features, and can therefore be analyzed as an explicit penalty or bonus to each feature. We show experimentally that a fair representation learning problem constrained in such a way does not impact performance.