LG AIAug 13, 2021

Online Fairness-Aware Learning with Imbalanced Data Streams

Vasileios Iosifidis, Wenbin Zhang, Eirini Ntoutsi

arXiv:2108.06231v17.59 citations

Originality Incremental advance

AI Analysis

This addresses fairness maintenance for online applications like network monitoring and job applications where data evolves over time, representing an incremental improvement over existing fairness-aware stream classifiers.

The paper tackles the problem of maintaining fairness in online learning with imbalanced data streams, where existing methods fail to handle class imbalance effectively. The proposed online boosting approach achieves significant improvements over state-of-the-art methods, with relative increases of 11.2%-14.2% in balanced accuracy, 22.6%-31.8% in gmean, 42.5%-49.6% in recall, 14.3%-25.7% in kappa, and 89.4%-96.6% in statistical parity.

Data-driven learning algorithms are employed in many online applications, in which data become available over time, like network monitoring, stock price prediction, job applications, etc. The underlying data distribution might evolve over time calling for model adaptation as new instances arrive and old instances become obsolete. In such dynamic environments, the so-called data streams, fairness-aware learning cannot be considered as a one-off requirement, but rather it should comprise a continual requirement over the stream. Recent fairness-aware stream classifiers ignore the problem of class imbalance, which manifests in many real-life applications, and mitigate discrimination mainly because they "reject" minority instances at large due to their inability to effectively learn all classes. In this work, we propose \ours, an online fairness-aware approach that maintains a valid and fair classifier over the stream. \ours~is an online boosting approach that changes the training distribution in an online fashion by monitoring stream's class imbalance and tweaks its decision boundary to mitigate discriminatory outcomes over the stream. Experiments on 8 real-world and 1 synthetic datasets from different domains with varying class imbalance demonstrate the superiority of our method over state-of-the-art fairness-aware stream approaches with a range (relative) increase [11.2\%-14.2\%] in balanced accuracy, [22.6\%-31.8\%] in gmean, [42.5\%-49.6\%] in recall, [14.3\%-25.7\%] in kappa and [89.4\%-96.6\%] in statistical parity (fairness).

View on arXiv PDF

Similar