ARMar 31

Improving Instruction Fetch Efficiency via High-Level Program Map Traversal

arXiv:2406.0673823.0
AI Analysis

This addresses performance bottlenecks in computer architecture, particularly for benchmarks with high miss rates, but is an incremental improvement over existing prefetching methods.

The paper tackles instruction fetch inefficiency by proposing instruction presending, which uses a high-level program map to move cache blocks and entries just in time, reducing fetch wait cycles by an order of magnitude compared to state-of-the-art prefetching schemes.

Efficiency in instruction fetching is critical to performance, and this requires the primary structures--L1 instruction caches (L1i), branch target buffers (BTB) and instruction TLBs (iTLB)--to have the requisite information when needed. This paper proposes instruction presending, which traverses a high-level program map to identify and move instruction cache blocks, BTB entries, and iTLB entries from the secondary to the primary structures in a "just in time" fashion. Empirical results are presented to demonstrate the efficacy of the proposed presending scheme. Presending reduces the number of cycles where the instruction fetch is waiting by an order of magnitude as compared to state-of-the-art instruction prefetching schemes while operating with small-sized primary BTBs. It is especially effective for benchmarks with a high base MPKI, where movement from secondary to primary structures is frequent. This improvement in fetch efficiency results in performance improvements in cases where this efficiency is important.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes