DCMay 4

HARP: Orchestrating Automated Parallel Training on Heterogeneous GPU Clusters

arXiv:2509.248595.7h-index: 14
Predicted impact top 33% in DC · last 90 daysOriginality Highly original
AI Analysis

This work addresses the problem of resource underutilization in distributed training on heterogeneous GPU clusters, which is a practical issue for organizations with diverse hardware.

HARP is an automated parallel training framework for heterogeneous GPU clusters that uses a fine-grained planner and heterogeneity-aware scheduler to improve performance. It achieves 1.3x-1.6x higher performance than state-of-the-art frameworks.

With the rapid evolution of GPU architectures, the heterogeneity of model training infrastructures is steadily increasing. In such environments, effectively utilizing all available heterogeneous accelerators becomes critical for distributed model training. However, existing frameworks, which are primarily designed for homogeneous clusters, often exhibit significant resource underutilization when deployed on heterogeneous accelerators and networks. In this paper, we present Harp, an automated parallel training framework designed specifically for heterogeneous clusters. Harp introduces a fine-grained planner that efficiently searches a wide space for the inter-operator parallel strategy, enabling Harp to alleviate communication overheads while maintaining balanced loads across heterogeneous accelerators. In addition, Harp implements a heterogeneity-aware 1F1B scheduler that adaptively adjusts the execution timing and ordering of microbatches based on network characteristics, maximizing computation-communication overlap under cross-cluster interconnects while incurring only minimal memory overhead. Our evaluation results show that Harp can deliver 1.3x-1.6x higher performance on heterogeneous clusters than state-of-the-art training frameworks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes