Learning With Multi-Group Guarantees For Clusterable Subpopulations
This work addresses the need for robust fairness and performance guarantees in machine learning for clusterable subpopulations, offering a novel algorithmic improvement over existing methods.
The paper tackles the problem of ensuring prediction guarantees for meaningful subpopulations defined by clusters in a mixture model, proposing two formalisms for per-subgroup guarantees and a multi-objective algorithm that achieves an O(T^{1/2}) rate without requiring subpopulations to be well-separated, compared to a cluster-then-predict approach with O(T^{2/3}) rate and separability requirement.
A canonical desideratum for prediction problems is that performance guarantees should hold not just on average over the population, but also for meaningful subpopulations within the overall population. But what constitutes a meaningful subpopulation? In this work, we take the perspective that relevant subpopulations should be defined with respect to the clusters that naturally emerge from the distribution of individuals for which predictions are being made. In this view, a population refers to a mixture model whose components constitute the relevant subpopulations. We suggest two formalisms for capturing per-subgroup guarantees: first, by attributing each individual to the component from which they were most likely drawn, given their features; and second, by attributing each individual to all components in proportion to their relative likelihood of having been drawn from each component. Using online calibration as a case study, we study a multi-objective algorithm that provides guarantees for each of these formalisms by handling all plausible underlying subpopulation structures simultaneously, and achieve an $O(T^{1/2})$ rate even when the subpopulations are not well-separated. In comparison, the more natural cluster-then-predict approach that first recovers the structure of the subpopulations and then makes predictions suffers from a $O(T^{2/3})$ rate and requires the subpopulations to be separable. Along the way, we prove that providing per-subgroup calibration guarantees for underlying clusters can be easier than learning the clusters: separation between median subgroup features is required for the latter but not the former.