ML LGAug 23, 2025

On the sample complexity of semi-supervised multi-objective learning

Tobias Wegel, Geelon So, Junhyung Park, Fanny Yang

arXiv:2508.17152v11 citationsh-index: 3

Originality Incremental advance

AI Analysis

This addresses the challenge of efficiently learning multiple competing tasks with limited labeled data, which is incremental as it builds on existing MOL bounds by introducing semi-supervised insights.

The paper tackles the problem of sample complexity in semi-supervised multi-objective learning, showing that for some losses, increased model capacity unavoidably raises statistical costs, but for Bregman losses, unlabeled data can significantly reduce the need for labeled data, with a simple pseudo-labeling algorithm achieving these rates.

In multi-objective learning (MOL), several possibly competing prediction tasks must be solved jointly by a single model. Achieving good trade-offs may require a model class $\mathcal{G}$ with larger capacity than what is necessary for solving the individual tasks. This, in turn, increases the statistical cost, as reflected in known MOL bounds that depend on the complexity of $\mathcal{G}$. We show that this cost is unavoidable for some losses, even in an idealized semi-supervised setting, where the learner has access to the Bayes-optimal solutions for the individual tasks as well as the marginal distributions over the covariates. On the other hand, for objectives defined with Bregman losses, we prove that the complexity of $\mathcal{G}$ may come into play only in terms of unlabeled data. Concretely, we establish sample complexity upper bounds, showing precisely when and how unlabeled data can significantly alleviate the need for labeled data. These rates are achieved by a simple, semi-supervised algorithm via pseudo-labeling.

View on arXiv PDF

Similar