Bruno Scarone

h-index10
2papers

2 Papers

CYJul 22, 2025
The Impact of Pseudo-Science in Financial Loans Risk Prediction

Bruno Scarone, Ricardo Baeza-Yates

We study the societal impact of pseudo-scientific assumptions for predicting the behavior of people in a straightforward application of machine learning to risk prediction in financial lending. This use case also exemplifies the impact of survival bias in loan return prediction. We analyze the models in terms of their accuracy and social cost, showing that the socially optimal model may not imply a significant accuracy loss for this downstream task. Our results are verified for commonly used learning methods and datasets. Our findings also show that there is a natural dynamic when training models that suffer survival bias where accuracy slightly deteriorates, and whose recall and precision improves with time. These results act as an illusion, leading the observer to believe that the system is getting better, when in fact the model is suffering from increasingly more unfairness and survival bias.

LGMay 20, 2024
A Principled Approach for Data Bias Mitigation

Bruno Scarone, Alfredo Viola, Renée J. Miller et al.

The widespread use of machine learning and data-driven algorithms for decision making has been steadily increasing over many years. \emph{Bias} in the data can adversely affect this decision-making. We present a new mitigation strategy to address data bias. Our methods are explainable and come with mathematical guarantees of correctness. They can take advantage of new work on table discovery to find new tuples that can be added to a dataset to create real datasets that are unbiased or less biased. Our framework covers data with non-binary labels and with multiple sensitive attributes. Hence, we are able to measure and mitigate bias that does not appear over a single attribute (or feature), but only intersectionally, when considering a combination of attributes. We evaluate our techniques on publicly available datasets and provide a theoretical analysis of our results, highlighting novel insights into data bias.