A Federated Generalized Expectation-Maximization Algorithm for Mixture Models with an Unknown Number of Components
This addresses the problem of scalable and private clustering in distributed data settings for applications like healthcare or IoT, though it is incremental as it builds on EM methods.
The paper tackles federated clustering with an unknown number of clusters and heterogeneous client data by developing FedGEM, a federated generalized expectation-maximization algorithm, which achieves performance comparable to centralized EM and outperforms existing federated clustering methods in experiments.
We study the problem of federated clustering when the total number of clusters $K$ across clients is unknown, and the clients have heterogeneous but potentially overlapping cluster sets in their local data. To that end, we develop FedGEM: a federated generalized expectation-maximization algorithm for the training of mixture models with an unknown number of components. Our proposed algorithm relies on each of the clients performing EM steps locally, and constructing an uncertainty set around the maximizer associated with each local component. The central server utilizes the uncertainty sets to learn potential cluster overlaps between clients, and infer the global number of clusters via closed-form computations. We perform a thorough theoretical study of our algorithm, presenting probabilistic convergence guarantees under common assumptions. Subsequently, we study the specific setting of isotropic GMMs, providing tractable, low-complexity computations to be performed by each client during each iteration of the algorithm, as well as rigorously verifying assumptions required for algorithm convergence. We perform various numerical experiments, where we empirically demonstrate that our proposed method achieves comparable performance to centralized EM, and that it outperforms various existing federated clustering methods.