Exploring the Linear Subspace Hypothesis in Gender Bias Mitigation
This work addresses gender bias in NLP for researchers and practitioners, but it is incremental as it primarily validates an existing assumption rather than introducing a new solution.
The paper investigates the linear subspace assumption in gender bias mitigation for word representations, generalizing a prior method to a nonlinear version and empirically showing that gender bias is well captured by a linear subspace, thus validating the original assumption.
Bolukbasi et al. (2016) presents one of the first gender bias mitigation techniques for word representations. Their method takes pre-trained word representations as input and attempts to isolate a linear subspace that captures most of the gender bias in the representations. As judged by an analogical evaluation task, their method virtually eliminates gender bias in the representations. However, an implicit and untested assumption of their method is that the bias subspace is actually linear. In this work, we generalize their method to a kernelized, nonlinear version. We take inspiration from kernel principal component analysis and derive a nonlinear bias isolation technique. We discuss and overcome some of the practical drawbacks of our method for non-linear gender bias mitigation in word representations and analyze empirically whether the bias subspace is actually linear. Our analysis shows that gender bias is in fact well captured by a linear subspace, justifying the assumption of Bolukbasi et al. (2016).