ML LGDec 1, 2020

Fair Densities via Boosting the Sufficient Statistics of Exponential Families

Alexander Soen, Hisham Husain, Richard Nock

arXiv:2012.00188v42.73 citationsh-index: 35Has Code

Originality Highly original

AI Analysis

This work addresses the problem of pre-processing data to ensure fairness, which is crucial for practitioners building fair machine learning models, offering a new boosting-based approach.

This paper introduces a boosting algorithm to pre-process data for fairness, starting from an initial fair distribution and iteratively improving data fit while maintaining fairness. The method learns the sufficient statistics of an exponential family and is theoretically proven to provide representation rate and statistical rate data fairness guarantees.

We introduce a boosting algorithm to pre-process data for fairness. Starting from an initial fair but inaccurate distribution, our approach shifts towards better data fitting while still ensuring a minimal fairness guarantee. To do so, it learns the sufficient statistics of an exponential family with boosting-compliant convergence. Importantly, we are able to theoretically prove that the learned distribution will have a representation rate and statistical rate data fairness guarantee. Unlike recent optimization based pre-processing methods, our approach can be easily adapted for continuous domain features. Furthermore, when the weak learners are specified to be decision trees, the sufficient statistics of the learned distribution can be examined to provide clues on sources of (un)fairness. Empirical results are present to display the quality of result on real-world data.

View on arXiv PDF Code

Similar