SEAIFeb 6

RuleFlow : Generating Reusable Program Optimizations with LLMs

arXiv:2602.09051v1h-index: 2Has Code
Originality Highly original
AI Analysis

This work addresses the problem of unreliable and expensive per-program optimization for Pandas users, offering a reusable and efficient solution.

The paper tackles the challenge of optimizing Pandas programs by introducing RuleFlow, a hybrid approach that decouples discovery from deployment using LLMs and a compiler, achieving speedups of up to 4.3x over the previous compiler-based SOTA and 1914.9x over the previous systems-based SOTA on PandasBench.

Optimizing Pandas programs is a challenging problem. Existing systems and compiler-based approaches offer reliability but are either heavyweight or support only a limited set of optimizations. Conversely, using LLMs in a per-program optimization methodology can synthesize nontrivial optimizations, but is unreliable, expensive, and offers a low yield. In this work, we introduce a hybrid approach that works in a 3-stage manner that decouples discovery from deployment and connects them via a novel bridge. First, it discovers per-program optimizations (discovery). Second, they are converted into generalised rewrite rules (bridge). Finally, these rules are incorporated into a compiler that can automatically apply them wherever applicable, eliminating repeated reliance on LLMs (deployment). We demonstrate that RuleFlow is the new state-of-the-art (SOTA) Pandas optimization framework on PandasBench, a challenging Pandas benchmark consisting of Python notebooks. Across these notebooks, we achieve a speedup of up to 4.3x over Dias, the previous compiler-based SOTA, and 1914.9x over Modin, the previous systems-based SOTA. Our code is available at https://github.com/ADAPT-uiuc/RuleFlow.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes