Investigating Bias with a Synthetic Data Generator: Empirical Evidence and Philosophical Interpretation
This work addresses bias in machine learning systems, which is a critical issue for society, but it appears incremental as it builds on existing methods for analyzing bias.
The paper tackles the problem of bias in machine learning by introducing a synthetic data generator to create data with specific biases and their combinations, and analyzes the impact of these biases on performance and fairness metrics in both non-mitigated and mitigated models, though no concrete numbers are provided in the abstract.
Machine learning applications are becoming increasingly pervasive in our society. Since these decision-making systems rely on data-driven learning, risk is that they will systematically spread the bias embedded in data. In this paper, we propose to analyze biases by introducing a framework for generating synthetic data with specific types of bias and their combinations. We delve into the nature of these biases discussing their relationship to moral and justice frameworks. Finally, we exploit our proposed synthetic data generator to perform experiments on different scenarios, with various bias combinations. We thus analyze the impact of biases on performance and fairness metrics both in non-mitigated and mitigated machine learning models.