LGAIDec 18, 2020

Small Business Classification By Name: Addressing Gender and Geographic Origin Biases

arXiv:2012.10348v1
Originality Incremental advance
AI Analysis

This work addresses the problem of gender and geographic origin biases in small business classification models, which is important for fair customer segmentation.

This paper developed a model to classify small businesses into 66 types based solely on their names, achieving a top-1 f1-score of 60.2%. It explored two methods to mitigate gender and geographic origin biases: replacing given names with placeholders and data augmentation. Hiding given names reduced bias but decreased performance to a 56.6% f1-score, while data augmentation was less effective.

Small business classification is a difficult and important task within many applications, including customer segmentation. Training on small business names introduces gender and geographic origin biases. A model for predicting one of 66 business types based only upon the business name was developed in this work (top-1 f1-score = 60.2%). Two approaches to removing the bias from this model are explored: replacing given names with a placeholder token, and augmenting the training data with gender-swapped examples. The results for these approaches is reported, and the bias in the model was reduced by hiding given names from the model. However, bias reduction was accomplished at the expense of classification performance (top-1 f1-score = 56.6%). Augmentation of the training data with gender-swapping samples proved less effective at bias reduction than the name hiding approach on the evaluated dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes