Xinliang David Li

2papers

2 Papers

74.6SEMay 28
AI-PROPELLER: Warehouse-Scale Interprocedural Code Layout Optimization with AlphaEvolve

Chaitanya Mamatha Ananda, Rajiv Gupta, Mircea Trofin et al.

Post-link optimizers (PLOs) such as Propeller and BOLT have demonstrated that precise, profile-guided code layout can extract significant performance gains from heavily optimized binaries. However, these systems are currently restricted to intraprocedural techniques, leaving the global potential of interprocedural layout largely untapped. Interprocedural code layout is historically difficult due to a combinatorially intractable search space and complex call-return semantics that are challenging to model. Consequently, the performance potential of fine-grained interprocedural layout remains unproven in practice. AI-PROPELLER uses Magellan, an agentic workflow that evolves the compiler heuristic in Propeller into a fine-grained interprocedural optimizer and fine-tunes the resulting policy hyperparameters. To ensure high-fidelity, we move away from approximate static cost models and the agentic workflow generates multiple layout variants that are executed on actual hardware to measure real performance counters, providing a precise reward signal for the evolutionary loop. AI-PROPELLER has been evaluated on several benchmarks including large warehouse-scale applications and experiments show performance improvements of 0.23% to 1.6% optimized with state-of-the-art FDO and PLO which is significant for real-world binaries. This is the first time ever that large warehouse-scale applications in industrial settings have been optimized with fine-grained interprocedural code layout.

LGFeb 10, 2022
Learning Branch Probabilities in Compiler from Datacenter Workloads

Easwaran Raman, Xinliang David Li

Estimating the probability with which a conditional branch instruction is taken is an important analysis that enables many optimizations in modern compilers. When using Profile Guided Optimizations (PGO), compilers are able to make a good estimation of the branch probabilities. In the absence of profile information, compilers resort to using heuristics for this purpose. In this work, we propose learning branch probabilities from a large corpus of data obtained from datacenter workloads. Using metrics including Root Mean Squared Error, Mean Absolute Error and cross-entropy, we show that the machine learning model improves branch probability estimation by 18-50% in comparison to compiler heuristics. This translates to performance improvement of up to 8.1% on 24 out of a suite of 40 benchmarks with a 1% geomean improvement on the suite. This also results in greater than 1.2% performance improvement in an important search application.