DCAIApr 2

ModTrans: Translating Real-world Models for Distributed Training Simulator

arXiv:2604.016079.6h-index: 1
AI Analysis

This addresses a barrier for ML researchers and system researchers by enabling easier simulation of distributed training without physical resources, though it is incremental as it builds on existing simulators.

The authors tackled the problem of distributed training simulators not supporting real-world models by developing ModTrans, a translator that converts any real-world model into the ASTRA-sim simulator's input format, with negligible cost in experiments.

Large-scale distributed training has been a research hot spot in machine learning systems for industry and academia in recent years. However, conducting experiments without physical machines and corresponding resources is difficult. One solution is to leverage distributed training simulators, but current ones like ASTRA-sim do not support importing real-world developed models, which poses challenges for ML researchers seeking to use them. Based on this challenge, we developed ModTrans, a translator supporting format translation from any real-world model to the ASTRA-sim simulator's input, removing the barrier between machine learning experts and machine learning system researchers. The experiment results show that ModTrans's cost is negligible.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes