LGSep 18, 2025

Optimal Learning from Label Proportions with General Loss Functions

Lorne Applebaum, Travis Dick, Claudio Gentile, Haim Kaplan, Tomer Koren

arXiv:2509.15145v17.11 citationsh-index: 31

Originality Incremental advance

AI Analysis

This addresses a partially-supervised learning problem relevant to domains like online advertising, offering incremental improvements in flexibility and performance.

The paper tackles the problem of Learning from Label Proportions (LLP), where training data has only average labels per group, by introducing a low-variance de-biasing method that improves sample complexity guarantees and shows empirical advantages over baselines on benchmark datasets.

Motivated by problems in online advertising, we address the task of Learning from Label Proportions (LLP). In this partially-supervised setting, training data consists of groups of examples, termed bags, for which we only observe the average label value. The main goal, however, remains the design of a predictor for the labels of individual examples. We introduce a novel and versatile low-variance de-biasing methodology to learn from aggregate label information, significantly advancing the state of the art in LLP. Our approach exhibits remarkable flexibility, seamlessly accommodating a broad spectrum of practically relevant loss functions across both binary and multi-class classification settings. By carefully combining our estimators with standard techniques, we substantially improve sample complexity guarantees for a large class of losses of practical relevance. We also empirically validate the efficacy of our proposed approach across a diverse array of benchmark datasets, demonstrating compelling empirical advantages over standard baselines.

View on arXiv PDF

Similar