Evaluating Fairness in Self-supervised and Supervised Models for Sequential Data
This addresses fairness issues in human-centric applications like healthcare, where data is scarce, but it is incremental as it builds on existing SSL and fairness research.
The study investigated whether self-supervised learning (SSL) models learn less biased representations than supervised models for sequential data, finding that SSL can achieve similar performance while improving fairness by up to 27% with only a 1% performance loss.
Self-supervised learning (SSL) has become the de facto training paradigm of large models where pre-training is followed by supervised fine-tuning using domain-specific data and labels. Hypothesizing that SSL models would learn more generic, hence less biased, representations, this study explores the impact of pre-training and fine-tuning strategies on fairness (i.e., performing equally on different demographic breakdowns). Motivated by human-centric applications on real-world timeseries data, we interpret inductive biases on the model, layer, and metric levels by systematically comparing SSL models to their supervised counterparts. Our findings demonstrate that SSL has the capacity to achieve performance on par with supervised methods while significantly enhancing fairness--exhibiting up to a 27% increase in fairness with a mere 1% loss in performance through self-supervision. Ultimately, this work underscores SSL's potential in human-centric computing, particularly high-stakes, data-scarce application domains like healthcare.