Learning from Label Proportions by Learning with Label Noise
This work addresses the challenge of learning from label proportions for multi-class data, providing a novel and theoretically sound solution that could benefit applications with limited instance-level labels, though it is incremental in building on prior label noise methods.
The paper tackles the weakly supervised classification problem of learning from label proportions (LLP) for multi-class data by proposing a theoretically grounded algorithm based on a reduction to learning with label noise, using the forward correction loss. It demonstrates improved empirical performance across multiple datasets and architectures compared to leading existing methods.
Learning from label proportions (LLP) is a weakly supervised classification problem where data points are grouped into bags, and the label proportions within each bag are observed instead of the instance-level labels. The task is to learn a classifier to predict the individual labels of future individual instances. Prior work on LLP for multi-class data has yet to develop a theoretically grounded algorithm. In this work, we provide a theoretically grounded approach to LLP based on a reduction to learning with label noise, using the forward correction (FC) loss of \citet{Patrini2017MakingDN}. We establish an excess risk bound and generalization error analysis for our approach, while also extending the theory of the FC loss which may be of independent interest. Our approach demonstrates improved empirical performance in deep learning scenarios across multiple datasets and architectures, compared to the leading existing methods.