Distributed Linear Regression with Compositional Covariates
This work addresses the problem of handling large-scale compositional data in distributed settings for statisticians and data scientists, representing an incremental improvement by adapting existing optimization methods to this specific domain.
The paper tackles distributed sparse penalized linear regression for massive compositional data by proposing two distributed optimization algorithms based on ADMM and CDMM frameworks, achieving communication-efficient estimation with rigorous convergence proofs and validated through numerical experiments on synthetic and real data.
With the availability of extraordinarily huge data sets, solving the problems of distributed statistical methodology and computing for such data sets has become increasingly crucial in the big data area. In this paper, we focus on the distributed sparse penalized linear log-contrast model in massive compositional data. In particular, two distributed optimization techniques under centralized and decentralized topologies are proposed for solving the two different constrained convex optimization problems. Both two proposed algorithms are based on the frameworks of Alternating Direction Method of Multipliers (ADMM) and Coordinate Descent Method of Multipliers(CDMM, Lin et al., 2014, Biometrika). It is worth emphasizing that, in the decentralized topology, we introduce a distributed coordinate-wise descent algorithm based on Group ADMM(GADMM, Elgabli et al., 2020, Journal of Machine Learning Research) for obtaining a communication-efficient regularized estimation. Correspondingly, the convergence theories of the proposed algorithms are rigorously established under some regularity conditions. Numerical experiments on both synthetic and real data are conducted to evaluate our proposed algorithms.