LGDCPLJan 20, 2024

PartIR: Composing SPMD Partitioning Strategies for Machine Learning

arXiv:2401.11202v414 citationsASPLOS
Originality Incremental advance
AI Analysis

This work addresses the problem of complex parallelization in machine learning training for researchers and practitioners, offering an incremental improvement in partitioning tools.

The paper tackles the challenge of efficiently partitioning large neural networks for training by introducing PartIR, a system that allows composition of sharding strategies and provides predictable performance estimates, demonstrating its effectiveness across various models.

Training of modern large neural networks (NN) requires a combination of parallelization strategies encompassing data, model, or optimizer sharding. When strategies increase in complexity, it becomes necessary for partitioning tools to be 1) expressive, allowing the composition of simpler strategies, and 2) predictable to estimate performance analytically. We present PartIR, our design for a NN partitioning system. PartIR is focused on an incremental approach to rewriting and is hardware-and-runtime agnostic. We present a simple but powerful API for composing sharding strategies and a simulator to validate them. The process is driven by high-level programmer-issued partitioning tactics, which can be both manual and automatic. Importantly, the tactics are specified separately from the model code, making them easy to change. We evaluate PartIR on several different models to demonstrate its predictability, expressibility, and ability to reach peak performance..

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes