Detecting Gender Bias in Transformer-based Models: A Case Study on BERT
This addresses gender bias in AI models, which is a critical fairness issue, but the approach is incremental as it builds on existing attention mechanisms.
The paper tackles gender bias detection in transformer-based models like BERT by proposing a novel method using attention maps, and finds that attention matrices Wq and Wk introduce significantly more bias than other modules, with bias flowing periodically inside the model.
In this paper, we propose a novel gender bias detection method by utilizing attention map for transformer-based models. We 1) give an intuitive gender bias judgement method by comparing the different relation degree between the genders and the occupation according to the attention scores, 2) design a gender bias detector by modifying the attention module, 3) insert the gender bias detector into different positions of the model to present the internal gender bias flow, and 4) draw the consistent gender bias conclusion by scanning the entire Wikipedia, a BERT pretraining dataset. We observe that 1) the attention matrices, Wq and Wk introduce much more gender bias than other modules (including the embedding layer) and 2) the bias degree changes periodically inside of the model (attention matrix Q, K, V, and the remaining part of the attention layer (including the fully-connected layer, the residual connection, and the layer normalization module) enhance the gender bias while the averaged attentions reduces the bias).