Small Language Models as Compiler Experts: Auto-Parallelization for Heterogeneous Systems
It addresses compiler optimization for heterogeneous systems, showing incremental improvement over existing methods like LLVM Polly, TVM, and Triton.
This paper tackled the problem of auto-parallelizing compilers for heterogeneous systems by evaluating small language models (approx. 1B parameters) as reasoning engines, achieving an average speedup of 6.81x and a peak of 43.25x on convolution operations across 376 evaluations.
Traditional auto-parallelizing compilers, reliant on rigid heuristics, struggle with the complexity of modern heterogeneous systems. This paper presents a comprehensive evaluation of small (approximately 1B parameter) language-model-driven compiler auto-parallelization. We evaluate three models: gemma3, llama3.2, and qwen2.5, using six reasoning strategies across 11 real-world kernels drawn from scientific computing, graph algorithms, and machine learning. Our system is benchmarked against strong compiler baselines, including LLVM Polly, TVM, and Triton. Across 376 total evaluations, the proposed approach achieves an average speedup of 6.81x and a peak performance of 43.25x on convolution operations. We analyze scalability, verify correctness using multiple sanitizers, and confirm robustness across diverse compilers and hardware platforms. Our results demonstrate that small, efficient language models can serve as powerful reasoning engines for complex compiler optimization tasks.