LG CV GT NEDec 20, 2024

Fair Distributed Machine Learning with Imbalanced Data as a Stackelberg Evolutionary Game

Sebastian Niehaus, Ingo Roeder, Nico Scherf

arXiv:2412.16079v12.6h-index: 15

Originality Highly original

AI Analysis

This work addresses fairness issues in distributed learning for medical applications, offering a novel game-theoretic approach to mitigate performance disparities caused by data imbalances.

The paper tackles the problem of data imbalances in decentralized machine learning, particularly in medical fields, by modeling it as a Stackelberg evolutionary game and proposing two weighting algorithms. The results show that the Adaptive Stackelberg Weighting Model (ASWM) improves performance for underrepresented nodes by 2.713% in AUC, with only a 0.441% average decrease for nodes with larger datasets.

Decentralised learning enables the training of deep learning algorithms without centralising data sets, resulting in benefits such as improved data privacy, operational efficiency and the fostering of data ownership policies. However, significant data imbalances pose a challenge in this framework. Participants with smaller datasets in distributed learning environments often achieve poorer results than participants with larger datasets. Data imbalances are particularly pronounced in medical fields and are caused by different patient populations, technological inequalities and divergent data collection practices. In this paper, we consider distributed learning as an Stackelberg evolutionary game. We present two algorithms for setting the weights of each node's contribution to the global model in each training round: the Deterministic Stackelberg Weighting Model (DSWM) and the Adaptive Stackelberg Weighting Model (ASWM). We use three medical datasets to highlight the impact of dynamic weighting on underrepresented nodes in distributed learning. Our results show that the ASWM significantly favours underrepresented nodes by improving their performance by 2.713% in AUC. Meanwhile, nodes with larger datasets experience only a modest average performance decrease of 0.441%.

View on arXiv PDF

Similar