Learning to Generate Fair Clusters from Demonstrations
This work is significant for practitioners who need to generate fair and interpretable clusters but struggle with incomplete or proxy fairness constraints, offering a method to learn these constraints from expert demonstrations.
This paper addresses the problem of identifying the intended fairness constraint in clustering from limited expert demonstrations, where prescribed fairness constraints are often incomplete. The authors propose an algorithm to identify the fairness metric and generate clusters using existing techniques, and a greedy method for novel fairness metrics. Empirical evaluation on three real-world datasets shows the approach's effectiveness in identifying underlying fairness and interpretability constraints for generating fair and interpretable clusters.
Fair clustering is the process of grouping similar entities together, while satisfying a mathematically well-defined fairness metric as a constraint. Due to the practical challenges in precise model specification, the prescribed fairness constraints are often incomplete and act as proxies to the intended fairness requirement, leading to biased outcomes when the system is deployed. We examine how to identify the intended fairness constraint for a problem based on limited demonstrations from an expert. Each demonstration is a clustering over a subset of the data. We present an algorithm to identify the fairness metric from demonstrations and generate clusters using existing off-the-shelf clustering techniques, and analyze its theoretical properties. To extend our approach to novel fairness metrics for which clustering algorithms do not currently exist, we present a greedy method for clustering. Additionally, we investigate how to generate interpretable solutions using our approach. Empirical evaluation on three real-world datasets demonstrates the effectiveness of our approach in quickly identifying the underlying fairness and interpretability constraints, which are then used to generate fair and interpretable clusters.