LGAIMAMar 10

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

arXiv:2603.10085v149.95 citationsh-index: 16Has Code
Predicted impact top 1% in LG · last 90 daysOriginality Highly original
AI Analysis

This addresses GPU kernel optimization for AI systems, offering a more interpretable and efficient approach compared to prior LLM-based methods.

The paper tackles the problem of inefficient and opaque GPU kernel optimization by LLMs, introducing KernelSkill, a multi-agent framework with expert skills and dual-level memory, achieving 100% success rate and speedups up to 5.44x on benchmark levels.

Improving GPU kernel efficiency is crucial for advancing AI systems. Recent work has explored leveraging large language models (LLMs) for GPU kernel generation and optimization. However, existing LLM-based kernel optimization pipelines typically rely on opaque, implicitly learned heuristics within the LLMs to determine optimization strategies. This leads to inefficient trial-and-error and weakly interpretable optimizations. Our key insight is to replace implicit heuristics with expert optimization skills that are knowledge-driven and aware of task trajectories. Specifically, we present KernelSkill, a multi-agent framework with a dual-level memory architecture. KernelSkill operates by coordinating agents with long-term memory of reusable expert skills and short-term memory to prevent repetitive backtracking. On KernelBench Levels 1-3, KernelSkill achieves a 100% success rate and average speedups of 5.44x, 2.82x, and 1.92x over Torch Eager on Levels 1, 2, and 3, respectively, outperforming prior baselines. Code is available at https://github.com/0satan0/KernelMem/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes