LGJan 24, 2022

Decentralized EM to Learn Gaussian Mixtures from Datasets Distributed by Features

arXiv:2201.09965v1
Originality Incremental advance
AI Analysis

This addresses privacy and computational bottlenecks for applications like user profiling, but it is incremental as it extends EM to a new data partitioning scheme.

The paper tackles the problem of learning Gaussian mixtures from data distributed by features, which is common in scenarios like user profiling, by proposing a decentralized EM algorithm (VP-EM) that matches centralized EM in federated setups and approximates it in peer-to-peer networks with exponentially vanishing error, outperforming existing benchmarks.

Expectation Maximization (EM) is the standard method to learn Gaussian mixtures. Yet its classic, centralized form is often infeasible, due to privacy concerns and computational and communication bottlenecks. Prior work dealt with data distributed by examples, horizontal partitioning, but we lack a counterpart for data scattered by features, an increasingly common scheme (e.g. user profiling with data from multiple entities). To fill this gap, we provide an EM-based algorithm to fit Gaussian mixtures to Vertically Partitioned data (VP-EM). In federated learning setups, our algorithm matches the centralized EM fitting of Gaussian mixtures constrained to a subspace. In arbitrary communication graphs, consensus averaging allows VP-EM to run on large peer-to-peer networks as an EM approximation. This mismatch comes from consensus error only, which vanishes exponentially fast with the number of consensus rounds. We demonstrate VP-EM on various topologies for both synthetic and real data, evaluating its approximation of centralized EM and seeing that it outperforms the available benchmark.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes