LGMLJan 28, 2023

Decentralized Entropic Optimal Transport for Distributed Distribution Comparison

arXiv:2301.12065v22 citationsh-index: 24
Originality Incremental advance
AI Analysis

This work addresses a privacy and efficiency challenge for distributed machine learning systems, though it appears incremental as it builds on existing entropic optimal transport methods.

The paper tackles the problem of comparing distributions when data is scattered across agents in a distributed system without direct sharing, proposing a decentralized entropic optimal transport method that is communication-efficient and privacy-preserving, with experiments showing effectiveness on synthetic and real-world tasks.

Distributed distribution comparison aims to measure the distance between the distributions whose data are scattered across different agents in a distributed system and cannot even be shared directly among the agents. In this study, we propose a novel decentralized entropic optimal transport (DEOT) method, which provides a communication-efficient and privacy-preserving solution to this problem with theoretical guarantees. In particular, we design a mini-batch randomized block-coordinate descent (MRBCD) scheme to optimize the DEOT distance in its dual form. The dual variables are scattered across different agents and updated locally and iteratively with limited communications among partial agents. The kernel matrix involved in the gradients of the dual variables is estimated by a decentralized kernel approximation method, in which each agent only needs to approximate and store a sub-kernel matrix by one-shot communication and without sharing raw data. Besides computing entropic Wasserstein distance, we show that the proposed MRBCD scheme and kernel approximation method also apply to entropic Gromov-Wasserstein distance. We analyze our method's communication complexity and, under mild assumptions, provide a theoretical bound for the approximation error caused by the convergence error, the estimated kernel, and the mismatch between the storage and communication protocols. In addition, we discuss the trade-off between the precision of the EOT distance and the strength of privacy protection when implementing our method. Experiments on synthetic data and real-world distributed domain adaptation tasks demonstrate the effectiveness of our method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes