FairPFN: Transformers Can do Counterfactual Fairness
This work addresses fairness issues in high-stakes domains like healthcare and finance by providing a method that removes the need for domain knowledge in causal modeling, though it is incremental as it builds on prior in-context learning and PFN techniques.
The paper tackles the practical limitations of counterfactual fairness in ML systems by introducing FairPFN, a transformer pretrained on synthetic fairness data to eliminate causal effects of protected attributes from observational data without requiring a correct causal model. The model's effectiveness is demonstrated on synthetic case studies and real-world datasets, paving the way for transformers in causal and counterfactual fairness research.
Machine Learning systems are increasingly prevalent across healthcare, law enforcement, and finance but often operate on historical data, which may carry biases against certain demographic groups. Causal and counterfactual fairness provides an intuitive way to define fairness that closely aligns with legal standards. Despite its theoretical benefits, counterfactual fairness comes with several practical limitations, largely related to the reliance on domain knowledge and approximate causal discovery techniques in constructing a causal model. In this study, we take a fresh perspective on counterfactually fair prediction, building upon recent work in in context learning (ICL) and prior fitted networks (PFNs) to learn a transformer called FairPFN. This model is pretrained using synthetic fairness data to eliminate the causal effects of protected attributes directly from observational data, removing the requirement of access to the correct causal model in practice. In our experiments, we thoroughly assess the effectiveness of FairPFN in eliminating the causal impact of protected attributes on a series of synthetic case studies and real world datasets. Our findings pave the way for a new and promising research area: transformers for causal and counterfactual fairness.