LGDSMLFeb 16, 2024

Collaborative Learning with Different Labeling Functions

arXiv:2402.10445v32 citationsh-index: 1ICML
AI Analysis

This work addresses the challenge of collaborative learning in scenarios where no single classifier fits all distributions, which is incremental as it builds on prior multi-task learning assumptions.

The paper tackles the problem of learning accurate classifiers for multiple data distributions with different labeling functions while minimizing total sample usage, showing that sample-efficient learning is feasible under a weaker realizability assumption. It presents an ERM-based algorithm with proven sample efficiency but NP-hard computational complexity, though efficient learners exist for two special cases.

We study a variant of Collaborative PAC Learning, in which we aim to learn an accurate classifier for each of the $n$ data distributions, while minimizing the number of samples drawn from them in total. Unlike in the usual collaborative learning setup, it is not assumed that there exists a single classifier that is simultaneously accurate for all distributions. We show that, when the data distributions satisfy a weaker realizability assumption, which appeared in [Crammer and Mansour, 2012] in the context of multi-task learning, sample-efficient learning is still feasible. We give a learning algorithm based on Empirical Risk Minimization (ERM) on a natural augmentation of the hypothesis class, and the analysis relies on an upper bound on the VC dimension of this augmented class. In terms of the computational efficiency, we show that ERM on the augmented hypothesis class is NP-hard, which gives evidence against the existence of computationally efficient learners in general. On the positive side, for two special cases, we give learners that are both sample- and computationally-efficient.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes