LGFeb 23, 2024
Debiasing Machine Learning Models by Using Weakly Supervised LearningRenan D. B. Brotto, Jean-Michel Loubes, Laurent Risser et al.
We tackle the problem of bias mitigation of algorithmic decisions in a setting where both the output of the algorithm and the sensitive variable are continuous. Most of prior work deals with discrete sensitive variables, meaning that the biases are measured for subgroups of persons defined by a label, leaving out important algorithmic bias cases, where the sensitive variable is continuous. Typical examples are unfair decisions made with respect to the age or the financial status. In our work, we then propose a bias mitigation strategy for continuous sensitive variables, based on the notion of endogeneity which comes from the field of econometrics. In addition to solve this new problem, our bias mitigation strategy is a weakly supervised learning method which requires that a small portion of the data can be measured in a fair manner. It is model agnostic, in the sense that it does not make any hypothesis on the prediction model. It also makes use of a reasonably large amount of input observations and their corresponding predictions. Only a small fraction of the true output predictions should be known. This therefore limits the need for expert interventions. Results obtained on synthetic data show the effectiveness of our approach for examples as close as possible to real-life applications in econometrics.
EMFeb 16, 2022
Fairness constraint in Structural Econometrics and Application to fair estimation using Instrumental VariablesSamuele Centorrino, Jean-Pierre Florens, Jean-Michel Loubes
A supervised machine learning algorithm determines a model from a learning sample that will be used to predict new observations. To this end, it aggregates individual characteristics of the observations of the learning sample. But this information aggregation does not consider any potential selection on unobservables and any status-quo biases which may be contained in the training sample. The latter bias has raised concerns around the so-called \textit{fairness} of machine learning algorithms, especially towards disadvantaged groups. In this chapter, we review the issue of fairness in machine learning through the lenses of structural econometrics models in which the unknown index is the solution of a functional equation and issues of endogeneity are explicitly accounted for. We model fairness as a linear operator whose null space contains the set of strictly {\it fair} indexes. A {\it fair} solution is obtained by projecting the unconstrained index into the null space of this operator or by directly finding the closest solution of the functional equation into this null space. We also acknowledge that policymakers may incur a cost when moving away from the status quo. Achieving \textit{approximate fairness} is obtained by introducing a fairness penalty in the learning procedure and balancing more or less heavily the influence between the status quo and a full fair solution.
STSep 11, 2017
Is completeness necessary? Estimation in nonidentified linear modelsAndrii Babii, Jean-Pierre Florens
Modern data analysis depends increasingly on estimating models via flexible high-dimensional or nonparametric machine learning methods, where the identification of structural parameters is often challenging and untestable. In linear settings, this identification hinges on the completeness condition, which requires the nonsingularity of a high-dimensional matrix or operator and may fail for finite samples or even at the population level. Regularized estimators provide a solution by enabling consistent estimation of structural or average structural functions, sometimes even under identification failure. We show that the asymptotic distribution in these cases can be nonstandard. We develop a comprehensive theory of regularized estimators, which include methods such as high-dimensional ridge regularization, gradient descent, and principal component analysis (PCA). The results are illustrated for high-dimensional and nonparametric instrumental variable regressions and are supported through simulation experiments.