LGDSFeb 7, 2024

PriorBoost: An Adaptive Algorithm for Learning from Aggregate Responses

arXiv:2402.04987v16 citationsh-index: 61ICML
Originality Incremental advance
AI Analysis

This work addresses the challenge of improving model accuracy in aggregate learning settings, which is incremental as it builds on existing bagging methods.

The paper tackles the problem of learning from aggregate responses by proposing PriorBoost, an adaptive algorithm that forms homogeneous bags of samples to improve model quality, achieving optimal event-level predictions in experiments.

This work studies algorithms for learning from aggregate responses. We focus on the construction of aggregation sets (called bags in the literature) for event-level loss functions. We prove for linear regression and generalized linear models (GLMs) that the optimal bagging problem reduces to one-dimensional size-constrained $k$-means clustering. Further, we theoretically quantify the advantage of using curated bags over random bags. We then propose the PriorBoost algorithm, which adaptively forms bags of samples that are increasingly homogeneous with respect to (unobserved) individual responses to improve model quality. We study label differential privacy for aggregate learning, and we also provide extensive experiments showing that PriorBoost regularly achieves optimal model quality for event-level predictions, in stark contrast to non-adaptive algorithms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes