Flint: Compiler Enabled Cluster-Free Design Space Exploration for Distributed ML
Addresses the lack of workload representation for exploring distributed ML system designs across the stack.
Flint uses ML compiler intermediate representations to enable flexible design space exploration for distributed ML systems, validated against post-execution traces.
Design space exploration for future distributed Machine Learning systems suffers from a lack of readily available workload representation that enables flexible exploration across the stack. We present Flint, a framework that bridges this gap by leveraging the Intermediate Representation of Machine Learning framework compilers. The compiler does the heavy weight lifting of understanding and preserving the behavior of the original model code. Flint can collect the workload representation of arbitrary cluster size because it interfaces with the compiler before hardware execution. We validate the workload graph against post-execution traces and show the flexibility of Flint through a design space exploration case study.