LG AISep 3, 2025

Warming Up for Zeroth-Order Federated Pre-Training with Low Resource Clients

Gwen Legate, Irina Rish, Eugene Belilovsky

arXiv:2509.03503v14.1h-index: 12

Originality Incremental advance

AI Analysis

This work addresses system-induced bias in federated learning by allowing participation of under-represented, low-resource clients, which is incremental as it builds on existing zeroth-order methods like MeZO.

The paper tackles the problem of low-resource edge devices being excluded from federated learning due to memory and communication constraints, and introduces ZOWarmUp, a zeroth-order optimizer that enables training from random initialization, improving data access and diversity for systems with many such devices.

Federated learning enables collaborative model training across numerous edge devices without requiring participants to share data; however, memory and communication constraints on these edge devices may preclude their participation in training. We consider a setting in which a subset of edge devices are below a critical memory or communication threshold required to conduct model updates. Under typical federated optimization algorithms, these devices are excluded from training which renders their data inaccessible and increases system induced bias. We are inspired by MeZO, a zeroth-order method used for memory-efficient fine-tuning. The increased variance inherent to zeroth-order gradient approximations has relegated previous zeroth-order optimizers exclusively to the domain of fine tuning; a limitation we seek to correct. We devise a federated, memory-efficient zeroth-order optimizer, ZOWarmUp that permits zeroth-order training from a random initialization. ZOWarmUp leverages differing client capabilities and careful variance reduction techniques to facilitate participation of under-represented, low-resource clients in model training. Like other federated zeroth-order methods, ZOWarmUp eliminates the need for edge devices to transmit their full gradients to the server and instead relies on only a small set of random seeds, rendering the up-link communication cost negligible. We present experiments using various datasets and model architectures to show that ZOWarmUp is a robust algorithm that can can be applied under a wide variety of circumstances. For systems with a high proportion of edge devices that would otherwise be excluded from training, this algorithm provides access to a greater volume and diversity of data, thus improving training outcomes.

View on arXiv PDF

Similar