LGAIMay 23, 2023

Fair Oversampling Technique using Heterogeneous Clusters

arXiv:2305.13875v119 citations
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in fair machine learning for imbalanced datasets, offering an incremental improvement to existing techniques.

The paper tackles the problem of classifier overfitting in fair oversampling techniques when original clusters are small, by developing a method that generates synthetic data with class-mix or group-mix features and uses an interpolation method to enhance validity. Experimental results on five datasets and three classifiers show effectiveness in improving fairness and utility.

Class imbalance and group (e.g., race, gender, and age) imbalance are acknowledged as two reasons in data that hinder the trade-off between fairness and utility of machine learning classifiers. Existing techniques have jointly addressed issues regarding class imbalance and group imbalance by proposing fair over-sampling techniques. Unlike the common oversampling techniques, which only address class imbalance, fair oversampling techniques significantly improve the abovementioned trade-off, as they can also address group imbalance. However, if the size of the original clusters is too small, these techniques may cause classifier overfitting. To address this problem, we herein develop a fair oversampling technique using data from heterogeneous clusters. The proposed technique generates synthetic data that have class-mix features or group-mix features to make classifiers robust to overfitting. Moreover, we develop an interpolation method that can enhance the validity of generated synthetic data by considering the original cluster distribution and data noise. Finally, we conduct experiments on five realistic datasets and three classifiers, and the experimental results demonstrate the effectiveness of the proposed technique in terms of fairness and utility.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes