LGAIJan 25

Causal Pre-training Under the Fairness Lens: An Empirical Study of TabPFN

arXiv:2601.17912v1
Originality Synthesis-oriented
AI Analysis

This work addresses the fairness implications of deploying causal pre-trained models like TabPFN in practice, highlighting an incremental gap in ensuring algorithmic fairness.

The study evaluated the fairness properties of TabPFN, a foundation model for tabular data pre-trained on synthetic datasets from structural causal models, finding that while it achieves stronger predictive accuracy and robustness to spurious correlations compared to baselines, improvements in fairness are moderate and inconsistent, especially under missing-not-at-random covariate shifts.

Foundation models for tabular data, such as the Tabular Prior-data Fitted Network (TabPFN), are pre-trained on a massive number of synthetic datasets generated by structural causal models (SCM). They leverage in-context learning to offer high predictive accuracy in real-world tasks. However, the fairness properties of these foundational models, which incorporate ideas from causal reasoning during pre-training, have not yet been explored in sufficient depth. In this work, we conduct a comprehensive empirical evaluation of TabPFN and its fine-tuned variants, assessing predictive performance, fairness, and robustness across varying dataset sizes and distributional shifts. Our results reveal that while TabPFN achieves stronger predictive accuracy compared to baselines and exhibits robustness to spurious correlations, improvements in fairness are moderate and inconsistent, particularly under missing-not-at-random (MNAR) covariate shifts. These findings suggest that the causal pre-training in TabPFN is helpful but insufficient for algorithmic fairness, highlighting implications for deploying such models in practice and the need for further fairness interventions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes