DCAINov 9, 2025

PRAGMA: A Profiling-Reasoned Multi-Agent Framework for Automatic Kernel Optimization

arXiv:2511.06345v18 citationsh-index: 22
Originality Incremental advance
AI Analysis

This addresses the need for more efficient kernel design in high-performance computing, though it is incremental by building on existing AI kernel generation methods.

The paper tackles the problem of automated kernel optimization by introducing PRAGMA, a profile-guided AI framework that integrates hardware profiling into the reasoning loop, achieving average speedups of 2.81× on CPU and 2.30× on GPU compared to Torch.

Designing high-performance kernels requires expert-level tuning and a deep understanding of hardware characteristics. Recent advances in large language models (LLMs) have enabled automated kernel generation, yet most existing systems rely solely on correctness or execution time feedback, lacking the ability to reason about low-level performance bottlenecks. In this paper, we introduce PRAGMA, a profile-guided AI kernel generation framework that integrates execution feedback and fine-grained hardware profiling into the reasoning loop. PRAGMA enables LLMs to identify performance bottlenecks, preserve historical best versions, and iteratively refine code quality. We evaluate PRAGMA on KernelBench, covering GPU and CPU backends. Results show that PRAGMA consistently outperforms baseline AIKG without profiling enabled and achieves 2.81$\times$ and 2.30$\times$ averaged speedups against Torch on CPU and GPU platforms, respectively.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes