Zhimin Mei

60.2LGApr 30

Fair Dataset Distillation via Cross-Group Barycenter Alignment

Mohammad Hossein Moslemi, Nima Hosseini Dashtbayaz, Zhimin Mei et al.

Dataset Distillation aims to compress a large dataset into a small synthetic one while maintaining predictive performance. We show that as different demographic groups exhibit distinct predictive patterns, the distillation process struggles to simultaneously preserve informative signals for all subgroups, regardless of whether group sizes are mildly or severely imbalanced. Consequently, models trained on distilled data can experience substantial performance drops for certain subgroups, leading to fairness gaps. Crucially, these gaps do not disappear by merely correcting group imbalance, since they stem from fundamental mismatches in subgroup predictive patterns rather than from sample-size disparities alone. We therefore formally analyze the interaction between these two sources of bias and cast the solution as identifying a group-imbalance-agnostic barycenter of the predictive information that induces similar representations across all subgroups. By distilling toward this shared aggregate representation, we show that group fairness concerns can be reduced. Our approach is compatible with existing distillation methods, and empirical results show that it substantially reduces bias introduced by dataset distillation.

LGMay 5, 2025

Early Prediction of Sepsis: Feature-Aligned Transfer Learning

Oyindolapo O. Komolafe, Zhimin Mei, David Morales Zarate et al.

Sepsis is a life threatening medical condition that occurs when the body has an extreme response to infection, leading to widespread inflammation, organ failure, and potentially death. Because sepsis can worsen rapidly, early detection is critical to saving lives. However, current diagnostic methods often identify sepsis only after significant damage has already occurred. Our project aims to address this challenge by developing a machine learning based system to predict sepsis in its early stages, giving healthcare providers more time to intervene. A major problem with existing models is the wide variability in the patient information or features they use, such as heart rate, temperature, and lab results. This inconsistency makes models difficult to compare and limits their ability to work across different hospitals and settings. To solve this, we propose a method called Feature Aligned Transfer Learning (FATL), which identifies and focuses on the most important and commonly reported features across multiple studies, ensuring the model remains consistent and clinically relevant. Most existing models are trained on narrow patient groups, leading to population bias. FATL addresses this by combining knowledge from models trained on diverse populations, using a weighted approach that reflects each models contribution. This makes the system more generalizable and effective across different patient demographics and clinical environments. FATL offers a practical and scalable solution for early sepsis detection, particularly in hospitals with limited resources, and has the potential to improve patient outcomes, reduce healthcare costs, and support more equitable healthcare delivery.

Zhimin Mei

2 Papers