ARMar 9

GOMA: Geometrically Optimal Mapping via Analytical Modeling for Spatial Accelerators

Wulve Yang, Hailong Zou, Rui Zhou, Jionghao Zhang, Qiang Li, Gang Li, Yi Zhan, Shushan Qiao

arXiv:2603.07962v1

Predicted impact top 52% in AR · last 90 daysOriginality Highly original

AI Analysis

This work provides a method for hardware designers and ML engineers to efficiently optimize GEMM operations on spatial accelerators, which is crucial for improving the performance and energy efficiency of large language models and other compute-intensive workloads. It offers a significant improvement over existing mapping solutions.

This paper addresses the challenge of finding optimal mappings for General Matrix Multiplication (GEMM) on spatial accelerators, which is critical for energy efficiency and execution speed. The authors developed GOMA, a framework that uses a geometric abstraction and analytical modeling to quickly identify globally optimal mappings. GOMA improves the energy-delay product by 2.24-4.24x and accelerates time-to-solution by 3.83-73.6x compared to state-of-the-art mappers.

General matrix multiplication (GEMM) on spatial accelerators is highly sensitive to mapping choices in both execution efficiency and energy consumption. However, the mapping space exhibits combinatorial explosion, which makes it extremely challenging to obtain optimal mappings within an acceptable time budget. Existing approaches typically face challenges: They often lack global-optimality guarantees and become prohibitively slow as the mapping space grows. To address these limitations, we propose \textsc{GOMA}, a geometric-abstraction-based, globally optimal GEMM mapping framework via analytical modeling, which achieves efficient solving while guaranteeing optimality. \textsc{GOMA} introduces, from first principles, a geometric abstraction for GEMM mapping, yielding an exact analytical energy objective with $O(1)$ evaluation for any given mapping. The objective is highly accurate. \textsc{GOMA} then formulates mapping selection as an integer optimization problem under hardware and mapping constraints, using the analytical energy model as the objective to automate mapping search. \textsc{GOMA} can quickly compute a global-optimal mapping for any (GEMM workload, target hardware) pair, achieving this for the first time in mapping space exploration. Experiments confirm that across representative accelerators and large language model prefill workloads, \textsc{GOMA} improves the energy--delay product (EDP) by $2.24$--$4.24\times$ over SOTA mappers, while accelerating time-to-solution by $3.83$--$73.6\times$.

View on arXiv PDF

Similar