4 Papers

50.2DCApr 13
Evaluating Cross-Architecture Performance Modeling of Distributed ML Workloads Using StableHLO

Jonas Svedas, Nathan Laubeuf, Ryan Harvey et al.

Predicting the performance of large-scale distributed machine learning (ML) workloads across multiple accelerator architectures remains a central challenge in ML system design. Existing GPU and TPU focused simulators are typically architecture-specific, while distributed training simulators rely on workload-specific analytical models or costly post-execution traces, limiting portability and cross-platform comparison. This work evaluates whether MLIR's StableHLO dialect can serve as a unified workload representation for cross-architecture and cross-fidelity performance modeling of distributed ML workloads. The study establishes a StableHLO-based simulation methodology that maps a single workload representation onto multiple performance models, spanning analytical, profiling-based, and simulator-driven predictors. Using this methodology, workloads are evaluated across GPUs and TPUs without requiring access to scaled-out physical systems, enabling systematic comparison across modeling fidelities. An empirical evaluation covering distributed GEMM kernels, ResNet, and large language model training workloads demonstrates that StableHLO preserves relative performance trends across architectures and fidelities, while exposing accuracy trade-offs and simulator limitations. Across evaluated scenarios, prediction errors remain within practical bounds for early-stage design exploration, and the methodology reveals fidelity-dependent limitations in existing GPU simulators. These results indicate that StableHLO provides a viable foundation for unified, distributed ML performance modeling across accelerator architectures and simulators, supporting reusable evaluation workflows and cross-validation throughout the ML system design process.

ROSep 10, 2021
SMARRT: Self-Repairing Motion-Reactive Anytime RRT for Dynamic Environments

Zongyuan Shen, James Wilson, Ryan Harvey et al.

This paper addresses the fast replanning problem in dynamic environments with moving obstacles. Since for randomly moving obstacles the future states are unpredictable, the proposed method, called SMARRT, reacts to obstacle motions and revises the path in real-time based on the current interfering obstacle state (i.e., position and velocity). SMARRT is fast and efficient and performs collision checking only on the partial path segment close to the robot within a feasibility checking horizon. If the path is infeasible, then tree parts associated with the path inside the horizon are pruned while maintaining the maximal tree structure of already-explored regions. Then, a multi-resolution utility map is created to capture the environmental information used to compute the replanning utility for each cell on the multi-scale tiling. A hierarchical searching method is applied on the map to find the sampling cell efficiently. Finally, uniform samples are drawn within the sampling cell for fast replanning. The SMARRT method is validated via simulation runs, and the results are evaluated in comparison to four existing methods. The SMARRT method yields significant improvements in travel time, replanning time, and success rate compared against the existing methods.

ROAug 3, 2021
CPPNet: A Coverage Path Planning Network

Zongyuan Shen, Palash Agrawal, James P. Wilson et al.

This paper presents a deep-learning based CPP algorithm, called Coverage Path Planning Network (CPPNet). CPPNet is built using a convolutional neural network (CNN) whose input is a graph-based representation of the occupancy grid map while its output is an edge probability heat graph, where the value of each edge is the probability of belonging to the optimal TSP tour. Finally, a greedy search is used to select the final optimized tour. CPPNet is trained and comparatively evaluated against the TSP tour. It is shown that CPPNet provides near-optimal solutions while requiring significantly less computational time, thus enabling real-time coverage path planning in partially unknown and dynamic environments.

ROApr 22, 2021
MRRT: Multiple Rapidly-Exploring Random Trees for Fast Online Replanning in Dynamic Environments

Zongyuan Shen, James P. Wilson, Ryan Harvey et al.

This paper presents a novel algorithm, called MRRT, which uses multiple rapidly-exploring random trees for fast online replanning of autonomous vehicles in dynamic environments with moving obstacles. The proposed algorithm is built upon the RRT algorithm with a multi-tree structure. At the beginning, the RRT algorithm is applied to find the initial solution based on partial knowledge of the environment. Then, the robot starts to execute this path. At each iteration, the new obstacle configurations are collected by the robot's sensor and used to replan the path. This new information can come from unknown static obstacles (e.g., seafloor layout) as well as moving obstacles. Then, to accommodate the environmental changes, two procedures are adopted: 1) edge pruning, and 2) tree regrowing. Specifically, the edge pruning procedure checks the collision status through the tree and only removes the invalid edges while maintaining the tree structure of already-explored regions. Due to removal of invalid edges, the tree could be broken into multiple disjoint trees. As such, the RRT algorithm is applied to regrow the trees. Specifically, a sample is created randomly and joined to all the disjoint trees in its local neighborhood by connecting to the nearest nodes. Finally, a new solution is found for the robot. The advantages of the proposed MRRT algorithm are as follows: i) retains the maximal tree structure by only pruning the edges which collide with the obstacles, ii) guarantees probabilistic completeness, and iii) is computational efficient for fast replanning since all disjoint trees are maintained for future connections and expanded simultaneously.