A Note on Stability for Orthogonalized Matrix Momentum with Client Sampling
For distributed learning with matrix parameters, this work provides theoretical guarantees that account for client sampling and heterogeneity, though the results are incremental and rely on specific regularity conditions.
The paper derives finite-sample generalization bounds for a distributed optimization scheme with matrix-valued parameters and orthogonalized momentum updates under client sampling, achieving \(\widetilde{\mathcal O}(n^{-1}+n^{-1/2})\) scaling in the uniform full-participation regime.
We study finite-sample generalization for a client-sampled distributed optimization scheme with matrix-valued parameters and orthogonalized momentum updates. The central quantity is the gap between the population and empirical objectives at the returned model when only a subset of clients participates in each round. Under independent heterogeneous client data, unequal local sample counts, and fixed aggregation weights, we derive a finite-round upper-tail guarantee from a coupled-neighbor stability recursion and a weighted concentration step. The bound keeps the client-selection counts through the amplification factor \(Y_i(\mathcal C)\); in the uniform full-participation full-batch regime, it yields \(\widetilde{\mathcal O}(n^{-1}+n^{-1/2})\) scaling whenever the horizon-dependent amplification terms are controlled. The matrix-orthogonalization rule is required to be Lipschitz along paired trajectories, a condition satisfied by regularized polar-type maps and normalized finite-step Newton--Schulz orthogonalizers. For the unregularized matrix sign, the same argument requires coupled spectral separation, whereas Gaussian smoothing gives a finite-round smoothed variant. A one-dimensional counterexample shows why a gap, smoothing, or regularity condition is necessary.