PFAINov 5, 2025

PerfDojo: Automated ML Library Generation for Heterogeneous Architectures

arXiv:2511.03586v11 citationsh-index: 19SC
Originality Highly original
AI Analysis

This addresses the problem of performance portability for ML developers on heterogeneous hardware, representing a novel method rather than an incremental improvement.

The paper tackles the challenge of optimizing machine learning performance across diverse hardware architectures by introducing PerfLLM, an automated methodology using Large Language Models and Reinforcement Learning, which achieves significant performance gains on CPUs and GPUs.

The increasing complexity of machine learning models and the proliferation of diverse hardware architectures (CPUs, GPUs, accelerators) make achieving optimal performance a significant challenge. Heterogeneity in instruction sets, specialized kernel requirements for different data types and model features (e.g., sparsity, quantization), and architecture-specific optimizations complicate performance tuning. Manual optimization is resource-intensive, while existing automatic approaches often rely on complex hardware-specific heuristics and uninterpretable intermediate representations, hindering performance portability. We introduce PerfLLM, a novel automatic optimization methodology leveraging Large Language Models (LLMs) and Reinforcement Learning (RL). Central to this is PerfDojo, an environment framing optimization as an RL game using a human-readable, mathematically-inspired code representation that guarantees semantic validity through transformations. This allows effective optimization without prior hardware knowledge, facilitating both human analysis and RL agent training. We demonstrate PerfLLM's ability to achieve significant performance gains across diverse CPU (x86, Arm, RISC-V) and GPU architectures.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes