ML LGJun 19, 2025

Identifying Heterogeneity in Distributed Learning

arXiv:2506.16394v34.5h-index: 4

Originality Incremental advance

AI Analysis

This work addresses the challenge of detecting heterogeneity in distributed learning for researchers and practitioners, offering incremental improvements in communication-efficient methods.

The paper tackles the problem of identifying heterogeneous parameter components in distributed M-estimation with minimal data transmission, proposing two tests (re-normalized Wald and extreme contrast) that achieve consistency under different sparsity conditions and demonstrate robust power in numerical experiments.

We study methods for identifying heterogeneous parameter components in distributed M-estimation with minimal data transmission. One is based on a re-normalized Wald test, which is shown to be consistent as long as the number of distributed data blocks $K$ is of a smaller order of the minimum block sample size and the level of heterogeneity is dense. The second one is an extreme contrast test (ECT) based on the difference between the largest and smallest component-wise estimated parameters among data blocks. By introducing a sample splitting procedure, the ECT can avoid the bias accumulation arising from the M-estimation procedures, and exhibits consistency for $K$ being much larger than the sample size while the heterogeneity is sparse. The ECT procedure is easy to operate and communication-efficient. A combination of the Wald and the extreme contrast tests is formulated to attain more robust power under varying levels of sparsity of the heterogeneity. We also conduct intensive numerical experiments to compare the family-wise error rate (FWER) and the power of the proposed methods. Additionally, we conduct a case study to present the implementation and validity of the proposed methods.

View on arXiv PDF

Similar