PLARApr 9

Optimization of 32-bit Unsigned Division by Constants on 64-bit Targets

arXiv:2604.0790232.2h-index: 8
AI Analysis

This work addresses a specific performance bottleneck in compiler optimizations for division operations on modern 64-bit processors, with incremental improvements over prior methods.

The paper tackles the problem of optimizing 32-bit unsigned division by constants on 64-bit CPUs, where existing methods designed for 32-bit CPUs do not fully exploit 64-bit capabilities. It proposes a new optimization method, implemented as patches for LLVM/GCC, achieving speedups of 1.67x on Intel Xeon w9-3495X and 1.98x on Apple M4 in microbenchmarks.

Granlund and Montgomery proposed an optimization method for unsigned integer division by constants [3]. Their method (called the GM method in this paper) was further improved in part by works such as [1] and [7], and is now adopted by major compilers including GCC, Clang, Microsoft Compiler, and Apple Clang. However, for example, for x/7, the generated code is designed for 32-bit CPUs and therefore does not fully exploit 64-bit capabilities. This paper proposes an optimization method for 32-bit unsigned division by constants targeting 64-bit CPUs. We implemented patches for LLVM/GCC and achieved speedups of 1.67x on Intel Xeon w9-3495X (Sapphire Rapids) and 1.98x on Apple M4 (Apple M-series SoC) in the microbenchmark described later. The LLVM patch has already been merged into llvm:main [6], demonstrating the practical applicability of the proposed method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes